Wicked Smart Data
LearnArticlesAbout
Sign InSign Up
LearnArticlesAboutContact
Sign InSign Up
Wicked Smart Data

The go-to platform for professionals who want to master data, automation, and AI — from Excel fundamentals to cutting-edge machine learning.

Platform

  • Learning Paths
  • Articles
  • About
  • Contact

Connect

  • Contact Us
  • RSS Feed

© 2026 Wicked Smart Data. All rights reserved.

Privacy PolicyTerms of Service
All Articles
Cleaning Messy Data with Power Query: Nulls, Errors, and Type Conversions

Cleaning Messy Data with Power Query: Nulls, Errors, and Type Conversions

Power Query🔥 Expert22 min readMar 31, 2026Updated Mar 31, 2026
Table of Contents
  • Prerequisites
  • Understanding Power Query's Error and Null System
  • The Error Propagation Chain
  • Null vs. Error: When to Use Each
  • Advanced Error Handling Strategies
  • The Try Expression: Controlled Error Conversion
  • Conditional Error Handling with Business Logic
  • Custom Error Types for Complex Scenarios
  • Mastering Type Conversions
  • Understanding Power Query's Type System
  • Robust Text-to-Number Conversions
  • Date and DateTime Conversions with Timezone Awareness

You're staring at a dataset imported from multiple CSV files, an aging SQL database, and a REST API that returns inconsistent data types. Half the records show "NULL" as text, others show actual nulls, and some contain error values that crash your analysis downstream. Welcome to the reality of data work—where clean, well-structured data is the exception, not the rule.

Power Query's strength isn't just in connecting to data sources—it's in transforming messy, real-world data into something reliable. But handling nulls, errors, and type conversions requires understanding how Power Query's evaluation engine works under the hood, particularly its error propagation system and type coercion rules. Most practitioners learn a few basic transformations and call it good. We're going deeper.

By mastering these advanced data cleaning techniques, you'll build robust data pipelines that handle edge cases gracefully and fail predictably when they do encounter issues. More importantly, you'll understand when to be aggressive with error handling and when to preserve errors for downstream investigation.

What you'll learn:

  • Power Query's internal null and error representation system and how it affects downstream operations
  • Advanced error handling patterns including try expressions, conditional logic, and custom error types
  • Type conversion strategies that preserve data integrity while handling edge cases
  • Performance optimization techniques for large-scale data cleaning operations
  • Building fault-tolerant data pipelines that degrade gracefully under various failure conditions

Prerequisites

You should be comfortable creating basic Power Query transformations, understand M language fundamentals (variables, let expressions, functions), and have experience with data types in analytical contexts. Familiarity with error handling concepts in other programming languages will be helpful but isn't required.

Understanding Power Query's Error and Null System

Before diving into cleaning techniques, you need to understand how Power Query handles errors and nulls at the engine level. This isn't academic—it directly impacts how your transformations behave in production.

The Error Propagation Chain

Power Query uses a lazy evaluation model where errors propagate through transformation chains until they're either handled or reach a sink (like a table visualization). This means an error in one column can cascade through multiple derived columns, making debugging challenging.

Consider this scenario: you're importing customer data where the "CustomerID" column sometimes contains invalid values. Here's what happens internally:

let
    Source = Table.FromRows({
        {"001", "25", "2023-01-15"},
        {"002", "invalid_age", "2023-01-16"},
        {"003", "35", "2023-01-17"}
    }, {"CustomerID", "Age", "SignupDate"}),
    
    // This creates an error in row 2
    AgeAsNumber = Table.TransformColumns(Source, {"Age", Int64.Type}),
    
    // This propagates the error to dependent calculations
    AgeCategory = Table.AddColumn(AgeAsNumber, "Category", 
        each if [Age] < 30 then "Young" else "Mature"),
    
    // Error continues propagating
    Premium = Table.AddColumn(AgeCategory, "IsPremium", 
        each [Age] > 25 and Text.Length([CustomerID]) = 3)
in
    Premium

Row 2 now contains errors in Age, Category, and IsPremium columns. The error originated from a single invalid age value but contaminated multiple downstream columns. Understanding this propagation is crucial for deciding where to intervene.

Null vs. Error: When to Use Each

Power Query distinguishes between null (missing data) and errors (invalid operations). This distinction matters for both performance and semantic correctness:

  • Nulls represent missing or inapplicable data and participate in operations (SUM ignores nulls, for example)
  • Errors represent failed operations and typically halt processing unless explicitly handled
let
    TestData = #table(
        {"Value", "Operation"},
        {
            {null, "null + 5"},
            {"invalid", "Text.ToNumber on invalid"}
        }
    ),
    
    // Null arithmetic returns null (propagates but doesn't fail)
    NullResult = null + 5,  // Returns null
    
    // Invalid conversion returns error (fails unless handled)
    ErrorResult = Text.ToNumber("invalid")  // Returns [Expression.Error]
in
    TestData

Choose nulls when data is missing but the record is otherwise valid. Choose errors when the data indicates a systemic problem that requires attention.

Advanced Error Handling Strategies

The Try Expression: Controlled Error Conversion

The try expression converts errors to records with metadata, allowing you to inspect and handle errors programmatically:

let
    Source = Table.FromRows({
        {"100", "2023-01-15"},
        {"not_a_number", "2023-01-16"},
        {"200", "invalid_date"},
        {"300", "2023-01-17"}
    }, {"Amount", "Date"}),
    
    // Comprehensive error handling with metadata preservation
    SafeConversions = Table.AddColumn(Source, "ProcessedAmount", 
        each 
            let
                TryAmount = try Number.FromText([Amount]),
                Result = if TryAmount[HasError] then
                    [
                        Value = null,
                        ErrorType = "ConversionError",
                        OriginalValue = [Amount],
                        ErrorDetail = TryAmount[Error][Message]
                    ]
                else
                    [
                        Value = TryAmount[Value],
                        ErrorType = null,
                        OriginalValue = [Amount],
                        ErrorDetail = null
                    ]
            in
                Result),
    
    // Expand the record for analysis
    ExpandedResults = Table.ExpandRecordColumn(SafeConversions, 
        "ProcessedAmount", 
        {"Value", "ErrorType", "OriginalValue", "ErrorDetail"})
in
    ExpandedResults

This pattern creates an audit trail of all conversion attempts, invaluable for debugging data quality issues in production systems.

Conditional Error Handling with Business Logic

Sometimes you need different error handling strategies based on business context. Here's a pattern for handling customer age data where different validation rules apply:

let
    Source = #table(
        {"CustomerID", "Age", "CustomerType"},
        {
            {"C001", "25", "Individual"},
            {"C002", "-5", "Individual"},
            {"C003", "150", "Individual"},
            {"C004", null, "Corporate"},
            {"C005", "unknown", "Individual"}
        }
    ),
    
    ProcessAge = Table.AddColumn(Source, "ValidatedAge",
        each
            let
                CustomerType = [CustomerType],
                RawAge = [Age]
            in
                if CustomerType = "Corporate" then
                    null  // Age not applicable for corporate customers
                else
                    let
                        TryAge = try Number.FromText(Text.From(RawAge))
                    in
                        if TryAge[HasError] then
                            null  // Convert invalid ages to null
                        else
                            let
                                NumericAge = TryAge[Value]
                            in
                                if NumericAge < 0 or NumericAge > 120 then
                                    error Error.Record("ValidationError", 
                                        "Age out of valid range: " & Text.From(NumericAge))
                                else
                                    NumericAge
    )
in
    ProcessAge

This approach treats missing corporate ages as expected nulls but flags biologically impossible ages as errors requiring investigation.

Custom Error Types for Complex Scenarios

For enterprise data pipelines, create custom error types that encode business meaning:

let
    // Define custom error constructors
    DataQualityError = (category as text, message as text, sourceValue as any) =>
        error Error.Record("DataQuality." & category, message, sourceValue),
    
    BusinessRuleError = (rule as text, message as text) =>
        error Error.Record("BusinessRule." & rule, message),
    
    // Example usage in transaction validation
    Source = #table(
        {"TransactionID", "Amount", "AccountType", "Timestamp"},
        {
            {"T001", "1000.00", "Checking", "2023-01-15T10:00:00"},
            {"T002", "-50.00", "Savings", "2023-01-15T10:01:00"},
            {"T003", "50000.00", "Checking", "2023-01-15T10:02:00"},
            {"T004", "invalid", "Checking", "2023-01-15T10:03:00"}
        }
    ),
    
    ValidateTransactions = Table.AddColumn(Source, "ValidatedAmount",
        each
            let
                TryAmount = try Number.FromText([Amount])
            in
                if TryAmount[HasError] then
                    DataQualityError("InvalidFormat", 
                        "Cannot convert amount to number", [Amount])
                else
                    let
                        Amount = TryAmount[Value],
                        AccountType = [AccountType]
                    in
                        if Amount < 0 and AccountType = "Savings" then
                            BusinessRuleError("NegativeBalance", 
                                "Savings accounts cannot have negative transactions")
                        else if Amount > 10000 then
                            BusinessRuleError("LargeTransaction", 
                                "Transactions over $10,000 require additional approval")
                        else
                            Amount
    )
in
    ValidateTransactions

These custom error types allow downstream error handlers to route different error categories to appropriate remediation workflows.

Mastering Type Conversions

Understanding Power Query's Type System

Power Query's type system is more nuanced than it appears. Types serve both as documentation and runtime constraints, but they behave differently in different contexts. Understanding these nuances prevents subtle bugs in production pipelines.

let
    // Explicit type specification
    TypedTable = #table(
        type table [ID = Int64.Type, Name = Text.Type, Score = Number.Type],
        {
            {1, "Alice", 95.5},
            {2, "Bob", 87.2}
        }
    ),
    
    // Type inference from data
    InferredTable = #table(
        {"ID", "Name", "Score"},
        {
            {1, "Alice", 95.5},
            {2, "Bob", 87.2}
        }
    ),
    
    // Check the difference
    TypedStructure = Value.Type(TypedTable),
    InferredStructure = Value.Type(InferredTable)
in
    {TypedStructure, InferredStructure}

The typed table enforces constraints at runtime, while the inferred table accepts any compatible values. This affects performance and error behavior downstream.

Robust Text-to-Number Conversions

Text-to-number conversion is where most data pipelines break. Here's a comprehensive approach that handles real-world messiness:

let
    Source = #table(
        {"ProductID", "Price", "Currency"},
        {
            {"P001", "$1,234.56", "USD"},
            {"P002", "€ 987,45", "EUR"},
            {"P003", "1.234,56 kr", "NOK"},
            {"P004", "invalid_price", "USD"},
            {"P005", "", "USD"},
            {"P006", "FREE", "USD"}
        }
    ),
    
    CleanAndConvertPrice = (priceText as text, currency as text) as any =>
        let
            // Handle special cases first
            CleanText = Text.Trim(Text.Upper(priceText))
        in
            if CleanText = "" or CleanText = null then
                null
            else if CleanText = "FREE" or CleanText = "N/A" then
                0
            else
                let
                    // Remove currency symbols and common formatting
                    Step1 = Text.Replace(priceText, "$", ""),
                    Step2 = Text.Replace(Step1, "€", ""),
                    Step3 = Text.Replace(Step2, "kr", ""),
                    Step4 = Text.Trim(Step3),
                    
                    // Handle different decimal/thousand separators by currency
                    CleanedText = 
                        if currency = "EUR" or currency = "NOK" then
                            // European format: 1.234,56
                            let
                                LastComma = Text.PositionOfAny(Step4, {","}, Occurrence.Last),
                                LastDot = Text.PositionOfAny(Step4, {"."}, Occurrence.Last)
                            in
                                if LastComma > LastDot then
                                    // Comma is decimal separator
                                    Text.Replace(Text.Replace(Step4, ".", ""), ",", ".")
                                else
                                    // Dot is decimal separator
                                    Text.Replace(Step4, ",", "")
                        else
                            // US format: 1,234.56
                            Text.Replace(Step4, ",", ""),
                    
                    // Attempt conversion
                    TryConvert = try Number.FromText(CleanedText)
                in
                    if TryConvert[HasError] then
                        error Error.Record("PriceConversion", 
                            "Cannot convert '" & priceText & "' to number")
                    else
                        TryConvert[Value],
    
    ProcessedPrices = Table.AddColumn(Source, "NumericPrice",
        each CleanAndConvertPrice([Price], [Currency]))
in
    ProcessedPrices

This approach handles multiple currency formats, special text values, and provides meaningful error messages for debugging.

Date and DateTime Conversions with Timezone Awareness

Date parsing is notoriously fragile due to format variations and timezone issues. Here's a robust pattern:

let
    Source = #table(
        {"EventID", "Timestamp", "Timezone"},
        {
            {"E001", "2023-01-15 14:30:00", "UTC"},
            {"E002", "15/01/2023 2:30 PM", "EST"},
            {"E003", "2023-01-15T14:30:00Z", "UTC"},
            {"E004", "Jan 15, 2023 14:30", "PST"},
            {"E005", "invalid_date", "UTC"}
        }
    ),
    
    ParseTimestamp = (timestampText as text, timezone as text) as any =>
        let
            CleanText = Text.Trim(timestampText)
        in
            if CleanText = "" or CleanText = null then
                null
            else
                let
                    // Try multiple parsing strategies
                    Strategies = {
                        () => DateTime.FromText(CleanText),
                        () => DateTime.FromText(CleanText, "en-US"),
                        () => DateTime.FromText(CleanText, "en-GB"),
                        () => DateTimeZone.FromText(CleanText)
                    },
                    
                    TryStrategies = List.Transform(Strategies, 
                        each try _()),
                    
                    SuccessfulParse = List.First(
                        List.Select(TryStrategies, each not [HasError]),
                        [HasError = true, Value = null]
                    )
                in
                    if SuccessfulParse[HasError] then
                        error Error.Record("DateParsing", 
                            "Cannot parse date: " & CleanText)
                    else
                        let
                            ParsedDateTime = SuccessfulParse[Value],
                            // Apply timezone conversion if needed
                            AdjustedDateTime = 
                                if timezone = "EST" then
                                    ParsedDateTime + #duration(0, 5, 0, 0)  // EST to UTC
                                else if timezone = "PST" then
                                    ParsedDateTime + #duration(0, 8, 0, 0)  // PST to UTC
                                else
                                    ParsedDateTime
                        in
                            AdjustedDateTime,
    
    ProcessedTimestamps = Table.AddColumn(Source, "ParsedTimestamp",
        each ParseTimestamp([Timestamp], [Timezone]))
in
    ProcessedTimestamps

This pattern tries multiple parsing strategies and handles basic timezone conversions, essential for multi-region data sources.

Handling Nulls at Scale

Null Propagation Strategies

Different null handling strategies are appropriate for different analytical contexts. Understanding when to preserve nulls versus when to substitute values is crucial:

let
    Source = #table(
        {"CustomerID", "Revenue", "Employees", "Industry"},
        {
            {"C001", 1000000, 50, "Technology"},
            {"C002", null, 25, "Healthcare"},
            {"C003", 500000, null, "Technology"},
            {"C004", null, null, "Manufacturing"}
        }
    ),
    
    // Strategy 1: Preserve nulls for aggregation-friendly operations
    PreserveNulls = Source,
    
    // Strategy 2: Contextual null replacement
    ContextualReplacement = Table.TransformColumns(Source, {
        {"Revenue", each _ ?? 0, type nullable number},
        {"Employees", each 
            if [Industry] = "Technology" then _ ?? 10
            else if [Industry] = "Healthcare" then _ ?? 15
            else _ ?? 5, type nullable number}
    }),
    
    // Strategy 3: Null flagging for analysis
    NullFlagging = Table.AddColumn(Source, "DataCompleteness",
        each [
            HasRevenue = [Revenue] <> null,
            HasEmployees = [Employees] <> null,
            CompletenessScore = 
                (if [Revenue] <> null then 1 else 0) + 
                (if [Employees] <> null then 1 else 0)
        ])
in
    {PreserveNulls, ContextualReplacement, NullFlagging}

The preserve nulls strategy maintains data integrity for statistical operations. Contextual replacement uses business logic to infer reasonable defaults. Null flagging creates metadata for data quality analysis.

Advanced Null Coalescing Patterns

For complex scenarios, you need sophisticated null coalescing that considers multiple fallback sources:

let
    PrimarySource = #table(
        {"ID", "Name", "Email"},
        {
            {1, "Alice Johnson", null},
            {2, null, "bob@email.com"},
            {3, "Charlie Brown", "charlie@email.com"}
        }
    ),
    
    SecondarySource = #table(
        {"ID", "Name", "Email"},
        {
            {1, "Alice J.", "alice@backup.com"},
            {2, "Robert Smith", null},
            {4, "Diana Prince", "diana@email.com"}
        }
    ),
    
    FallbackSource = #table(
        {"ID", "DefaultName"},
        {
            {1, "Customer #1"},
            {2, "Customer #2"},
            {3, "Customer #3"},
            {4, "Customer #4"}
        }
    ),
    
    // Intelligent coalescing with priority hierarchy
    CoalesceWithPriority = (primary as any, secondary as any, fallback as any) as any =>
        if primary <> null then primary
        else if secondary <> null then secondary  
        else fallback,
    
    // Join all sources
    Step1 = Table.Join(PrimarySource, "ID", SecondarySource, "ID", JoinKind.FullOuter),
    Step2 = Table.Join(Step1, "ID", FallbackSource, "ID", JoinKind.LeftOuter),
    
    // Apply smart coalescing
    FinalData = Table.AddColumn(Step2, "FinalName", 
        each CoalesceWithPriority([Name], [Name.1], [DefaultName])),
    
    // Clean up intermediate columns
    CleanedData = Table.SelectColumns(FinalData, {"ID", "FinalName", "Email"})
in
    CleanedData

This pattern implements a priority-based fallback system, essential for data consolidation scenarios where you're merging multiple imperfect data sources.

Performance Optimization for Large Datasets

Efficient Error Handling at Scale

When processing millions of rows, error handling performance becomes critical. Here are optimization strategies:

let
    // Simulate large dataset
    Source = Table.FromList(
        List.Numbers(1, 1000000),
        Splitter.SplitByNothing(),
        {"ID"}
    ),
    
    // Add some problematic data
    WithProblematicData = Table.AddColumn(Source, "Data",
        each if Number.Mod([ID], 10000) = 0 then "ERROR" else Text.From([ID] * 2)),
    
    // SLOW: Individual try expressions
    SlowErrorHandling = Table.AddColumn(WithProblematicData, "Slow_Converted",
        each 
            let
                TryResult = try Number.FromText([Data])
            in
                if TryResult[HasError] then null else TryResult[Value]
    ),
    
    // FAST: Batch filtering and conversion
    FastErrorHandling = 
        let
            // First, identify rows that will convert successfully
            ValidRows = Table.SelectRows(WithProblematicData, 
                each not Text.Contains([Data], "ERROR")),
            
            ErrorRows = Table.SelectRows(WithProblematicData,
                each Text.Contains([Data], "ERROR")),
            
            // Convert valid rows in batch
            ConvertedValid = Table.TransformColumns(ValidRows, 
                {"Data", Number.FromText, type number}),
            
            // Add null column to error rows
            ErrorRowsWithNull = Table.AddColumn(ErrorRows, "Data_Converted", 
                each null as nullable number),
            
            // Combine results
            Combined = Table.Combine({
                Table.SelectColumns(ConvertedValid, {"ID", "Data_Converted"}),
                Table.SelectColumns(ErrorRowsWithNull, {"ID", "Data_Converted"})
            })
        in
            Table.Sort(Combined, {"ID"})
in
    FastErrorHandling

The fast approach pre-filters rows to avoid expensive try operations on data that will obviously fail, improving performance by orders of magnitude on large datasets.

Memory-Efficient Type Conversions

For very large datasets, memory usage during type conversion can become problematic:

let
    // Memory-efficient approach using Table.Buffer strategically
    Source = // Your large data source,
    
    // Process in chunks to control memory usage
    ChunkSize = 100000,
    TotalRows = Table.RowCount(Source),
    ChunkCount = Number.RoundUp(TotalRows / ChunkSize),
    
    ProcessChunk = (startRow as number, chunkSize as number) =>
        let
            ChunkData = Table.Skip(Table.FirstN(Source, startRow + chunkSize), startRow),
            ProcessedChunk = Table.TransformColumns(ChunkData, {
                "StringColumn", each Text.Trim(_), type text,
                "NumberColumn", each try Number.FromText(_) otherwise null, type nullable number
            })
        in
            ProcessedChunk,
    
    ProcessedChunks = List.Transform(
        List.Numbers(0, ChunkCount),
        each ProcessChunk(_ * ChunkSize, ChunkSize)
    ),
    
    CombinedResult = Table.Combine(ProcessedChunks)
in
    CombinedResult

This chunking approach prevents memory exhaustion when processing datasets that exceed available RAM.

Building Fault-Tolerant Data Pipelines

Implementing Circuit Breaker Patterns

For production data pipelines, you need mechanisms that fail gracefully when data quality degrades beyond acceptable thresholds:

let
    Source = // Your data source,
    
    // Define quality thresholds
    MaxErrorRate = 0.05,  // 5% error rate threshold
    MinValidRows = 1000,   // Minimum viable dataset size
    
    // Process with quality monitoring
    ProcessWithQualityCheck = (data as table) as record =>
        let
            TotalRows = Table.RowCount(data),
            
            ProcessedData = Table.AddColumn(data, "ProcessedValue",
                each 
                    let
                        TryProcess = try YourProcessingFunction([SourceColumn])
                    in
                        [
                            Value = if TryProcess[HasError] then null else TryProcess[Value],
                            HasError = TryProcess[HasError],
                            ErrorMessage = if TryProcess[HasError] then TryProcess[Error][Message] else null
                        ]
            ),
            
            ExpandedData = Table.ExpandRecordColumn(ProcessedData, "ProcessedValue", 
                {"Value", "HasError", "ErrorMessage"}),
            
            ErrorCount = Table.RowCount(Table.SelectRows(ExpandedData, each [HasError] = true)),
            ValidCount = TotalRows - ErrorCount,
            ErrorRate = if TotalRows = 0 then 1 else ErrorCount / TotalRows,
            
            QualityAssessment = [
                TotalRows = TotalRows,
                ValidRows = ValidCount,
                ErrorRows = ErrorCount,
                ErrorRate = ErrorRate,
                IsAcceptable = ErrorRate <= MaxErrorRate and ValidRows >= MinValidRows
            ]
        in
            [
                Data = if QualityAssessment[IsAcceptable] 
                       then ExpandedData 
                       else error Error.Record("DataQuality", 
                           "Data quality below threshold: " & Number.ToText(ErrorRate)),
                Quality = QualityAssessment
            ],
    
    Result = ProcessWithQualityCheck(Source)
in
    Result

This circuit breaker pattern prevents low-quality data from propagating downstream and provides detailed quality metrics for monitoring.

Graceful Degradation Strategies

Sometimes you need pipelines that continue operating with reduced functionality rather than failing completely:

let
    Source = // Your data source,
    
    // Define fallback processing levels
    ProcessingLevel = (data as table) as record =>
        let
            // Level 1: Full processing (preferred)
            TryFullProcessing = try FullDataProcessing(data),
            
            FullResult = if not TryFullProcessing[HasError] then
                [
                    Data = TryFullProcessing[Value],
                    Level = "Full",
                    Degraded = false
                ]
            else
                // Level 2: Essential processing only
                let
                    TryEssentialProcessing = try EssentialDataProcessing(data)
                in
                    if not TryEssentialProcessing[HasError] then
                        [
                            Data = TryEssentialProcessing[Value],
                            Level = "Essential",
                            Degraded = true
                        ]
                    else
                        // Level 3: Minimal processing (last resort)
                        [
                            Data = MinimalDataProcessing(data),
                            Level = "Minimal",
                            Degraded = true
                        ]
        in
            FullResult,
    
    ProcessedResult = ProcessingLevel(Source),
    
    // Add degradation metadata
    FinalData = Table.AddColumn(ProcessedResult[Data], "ProcessingMetadata",
        each [
            ProcessingLevel = ProcessedResult[Level],
            IsDegraded = ProcessedResult[Degraded],
            ProcessedAt = DateTime.LocalNow()
        ])
in
    FinalData

This graceful degradation ensures your pipeline continues operating even when optimal processing fails, though with reduced functionality.

Hands-On Exercise

Let's build a comprehensive data cleaning pipeline that handles a realistic scenario: processing customer survey data with multiple data quality issues.

You have survey responses from three different collection systems with inconsistent formats:

let
    // Survey data from different sources with various quality issues
    OnlineSurveys = #table(
        {"ResponseID", "Age", "Income", "Satisfaction", "CollectionDate"},
        {
            {"ON001", "25", "$45,000", "Very Satisfied", "2023-01-15"},
            {"ON002", "invalid_age", "50000", "4", "2023-01-16"},
            {"ON003", "", "€55,000", "Satisfied", "bad_date"},
            {"ON004", "35", "75k", "3", "2023-01-18"}
        }
    ),
    
    PhoneSurveys = #table(
        {"ID", "Age", "AnnualIncome", "SatisfactionScore", "Date"},
        {
            {"PH001", 28, 48000, 4, #date(2023, 1, 20)},
            {"PH002", null, null, 5, #date(2023, 1, 21)},
            {"PH003", 150, -5000, 6, #date(2023, 1, 22)}
        }
    ),
    
    PaperSurveys = #table(
        {"SurveyID", "RespondentAge", "Salary", "Rating", "SubmissionDate"},
        {
            {"PA001", "30-35", "40000-50000", "Good", "Jan 25, 2023"},
            {"PA002", "25", "", "Excellent", "Jan 26, 2023"},
            {"PA003", "unknown", "60000", "Poor", ""}
        }
    ),
    
    // Step 1: Standardize column names and create unified structure
    StandardizeOnline = Table.RenameColumns(OnlineSurveys, {
        {"ResponseID", "ID"},
        {"Income", "AnnualIncome"},
        {"Satisfaction", "SatisfactionText"},
        {"CollectionDate", "Date"}
    }),
    
    StandardizePhone = Table.RenameColumns(PhoneSurveys, {
        {"AnnualIncome", "AnnualIncome"},
        {"SatisfactionScore", "SatisfactionNumeric"}
    }),
    
    StandardizePaper = Table.RenameColumns(PaperSurveys, {
        {"SurveyID", "ID"},
        {"RespondentAge", "Age"},
        {"Salary", "AnnualIncome"},
        {"Rating", "SatisfactionText"},
        {"SubmissionDate", "Date"}
    }),
    
    // Step 2: Add source tracking and combine
    OnlineWithSource = Table.AddColumn(StandardizeOnline, "Source", each "Online"),
    PhoneWithSource = Table.AddColumn(StandardizePhone, "Source", each "Phone"),
    PaperWithSource = Table.AddColumn(StandardizePaper, "Source", each "Paper"),
    
    // Combine all sources
    Combined = Table.Combine({OnlineWithSource, PhoneWithSource, PaperWithSource}),
    
    // Step 3: Implement comprehensive cleaning functions
    CleanAge = (ageValue as any, source as text) as any =>
        let
            ageText = Text.From(ageValue ?? "")
        in
            if ageText = "" or Text.Upper(ageText) = "UNKNOWN" then
                null
            else if Text.Contains(ageText, "-") then
                // Handle age ranges by taking midpoint
                let
                    parts = Text.Split(ageText, "-"),
                    minAge = try Number.FromText(parts{0}) otherwise null,
                    maxAge = try Number.FromText(parts{1}) otherwise null
                in
                    if minAge = null or maxAge = null then
                        error Error.Record("AgeRange", "Invalid age range: " & ageText)
                    else
                        (minAge + maxAge) / 2
            else
                let
                    tryNumeric = try Number.FromText(ageText)
                in
                    if tryNumeric[HasError] then
                        null
                    else
                        let
                            numericAge = tryNumeric[Value]
                        in
                            if numericAge < 13 or numericAge > 120 then
                                error Error.Record("InvalidAge", 
                                    "Age out of valid range: " & Text.From(numericAge))
                            else
                                numericAge,
    
    CleanIncome = (incomeValue as any) as any =>
        let
            incomeText = Text.From(incomeValue ?? "")
        in
            if incomeText = "" then
                null
            else if Text.Contains(incomeText, "-") then
                // Handle income ranges
                let
                    cleanText = Text.Replace(Text.Replace(incomeText, "$", ""), ",", ""),
                    parts = Text.Split(cleanText, "-"),
                    minIncome = try Number.FromText(parts{0}) otherwise null,
                    maxIncome = try Number.FromText(parts{1}) otherwise null
                in
                    if minIncome = null or maxIncome = null then
                        null
                    else
                        (minIncome + maxIncome) / 2
            else
                let
                    // Handle various currency and formatting symbols
                    step1 = Text.Replace(incomeText, "$", ""),
                    step2 = Text.Replace(step1, "€", ""),
                    step3 = Text.Replace(step2, ",", ""),
                    step4 = if Text.EndsWith(Text.Upper(step3), "K") then
                        Text.Replace(Text.Upper(step3), "K", "000")
                    else
                        step3,
                    
                    tryConvert = try Number.FromText(step4)
                in
                    if tryConvert[HasError] then
                        null
                    else
                        let
                            numericIncome = tryConvert[Value]
                        in
                            if numericIncome < 0 then
                                error Error.Record("NegativeIncome", 
                                    "Income cannot be negative: " & Text.From(numericIncome))
                            else if numericIncome > 10000000 then
                                error Error.Record("UnrealisticIncome", 
                                    "Income suspiciously high: " & Text.From(numericIncome))
                            else
                                numericIncome,
    
    NormalizeSatisfaction = (textValue as any, numericValue as any) as any =>
        if numericValue <> null then
            let
                score = Number.From(numericValue)
            in
                if score < 1 or score > 5 then
                    error Error.Record("InvalidSatisfaction", 
                        "Satisfaction score must be 1-5: " & Text.From(score))
                else
                    score
        else if textValue <> null then
            let
                upperText = Text.Upper(Text.Trim(Text.From(textValue)))
            in
                if upperText = "EXCELLENT" or upperText = "VERY SATISFIED" then 5
                else if upperText = "GOOD" or upperText = "SATISFIED" then 4
                else if upperText = "FAIR" or upperText = "NEUTRAL" then 3
                else if upperText = "POOR" or upperText = "DISSATISFIED" then 2
                else if upperText = "VERY POOR" or upperText = "VERY DISSATISFIED" then 1
                else
                    let
                        tryNumeric = try Number.FromText(upperText)
                    in
                        if tryNumeric[HasError] then
                            null
                        else
                            NormalizeSatisfaction(null, tryNumeric[Value])
        else
            null,
    
    // Step 4: Apply cleaning with error tracking
    CleanedData = Table.AddColumn(Combined, "CleaningResults", 
        each [
            Age = try CleanAge([Age], [Source]) otherwise null,
            Income = try CleanIncome([AnnualIncome]) otherwise null,
            Satisfaction = try NormalizeSatisfaction([SatisfactionText], [SatisfactionNumeric]) otherwise null,
            Errors = 
                let
                    ageError = try CleanAge([Age], [Source]),
                    incomeError = try CleanIncome([AnnualIncome]),
                    satError = try NormalizeSatisfaction([SatisfactionText], [SatisfactionNumeric]),
                    errorList = List.Select({
                        if ageError[HasError] then "Age: " & ageError[Error][Message] else null,
                        if incomeError[HasError] then "Income: " & incomeError[Error][Message] else null,
                        if satError[HasError] then "Satisfaction: " & satError[Error][Message] else null
                    }, each _ <> null)
                in
                    if List.Count(errorList) = 0 then null else Text.Combine(errorList, "; ")
        ]),
    
    // Step 5: Expand results and create final cleaned dataset
    ExpandedResults = Table.ExpandRecordColumn(CleanedData, "CleaningResults", 
        {"Age", "Income", "Satisfaction", "Errors"},
        {"CleanedAge", "CleanedIncome", "CleanedSatisfaction", "CleaningErrors"}),
    
    // Step 6: Add data quality metrics
    WithQualityMetrics = Table.AddColumn(ExpandedResults, "QualityScore",
        each 
            let
                scoreComponents = {
                    if [CleanedAge] <> null then 1 else 0,
                    if [CleanedIncome] <> null then 1 else 0,
                    if [CleanedSatisfaction] <> null then 1 else 0
                },
                totalScore = List.Sum(scoreComponents),
                maxScore = List.Count(scoreComponents)
            in
                totalScore / maxScore),
    
    // Step 7: Final column selection and organization
    FinalDataset = Table.SelectColumns(WithQualityMetrics, {
        "ID", "Source", "CleanedAge", "CleanedIncome", 
        "CleanedSatisfaction", "QualityScore", "CleaningErrors"
    })
in
    FinalDataset

This exercise demonstrates:

  • Handling multiple data sources with different schemas
  • Comprehensive error handling with business rule validation
  • Converting text ranges to numeric values
  • Normalizing categorical data across different formats
  • Creating data quality metrics for monitoring
  • Maintaining error audit trails for debugging

Common Mistakes & Troubleshooting

Mistake 1: Ignoring Error Propagation

Problem: Applying transformations without considering how errors propagate through the pipeline.

// BAD: Error in Age conversion breaks all subsequent calculations
BadExample = Table.AddColumn(Source, "AgeCategory",
    each if Number.From([Age]) < 30 then "Young" else "Mature")

// GOOD: Handle conversion errors explicitly
GoodExample = Table.AddColumn(Source, "AgeCategory",
    each 
        let
            tryAge = try Number.From([Age])
        in
            if tryAge[HasError] then
                "Unknown"
            else if tryAge[Value] < 30 then
                "Young" 
            else 
                "Mature")

Solution: Always handle type conversions at the point of use, not as a separate step that could introduce errors downstream.

Mistake 2: Inconsistent Null Handling

Problem: Mixing different null handling strategies within the same pipeline.

// BAD: Inconsistent null treatment
Step1 = Table.ReplaceValue(Source, null, 0, Replacer.ReplaceValue, {"Revenue"}),
Step2 = Table.AddColumn(Step1, "HasRevenue", each [Revenue] <> null)  // Always false now!

// GOOD: Consistent null strategy
ProcessedTable = Table.AddColumn(Source, "RevenueInfo", 
    each [
        Value = [Revenue] ?? 0,
        HasOriginalValue = [Revenue] <> null,
        IsImputed = [Revenue] = null
    ])

Solution: Decide on null handling strategy upfront and apply it consistently. Consider creating metadata columns to track imputation.

Mistake 3: Performance-Killing Error Handling

Problem: Using expensive error handling patterns on large datasets.

// BAD: Expensive for large datasets
SlowVersion = Table.AddColumn(LargeTable, "Converted",
    each 
        let
            allFormats = {"yyyy-MM-dd", "MM/dd/yyyy", "dd/MM/yyyy", "yyyy/MM/dd"},
            tryFormats = List.Transform(allFormats, 
                (format) => try DateTime.FromText([DateString], [Culture = "en-US", Format = format]))
        in
            List.First(List.Select(tryFormats, each not [HasError]), [Value = null])[Value])

// GOOD: Pre-filter and batch process
FastVersion = 
    let
        DetectFormat = (sample as text) =>
            if Text.Contains(sample, "-") and Text.Length(sample) = 10 then "yyyy-MM-dd"
            else if Text.Contains(sample, "/") then "MM/dd/yyyy"
            else "unknown",
        
        GroupedByFormat = Table.Group(LargeTable, {"DateFormat"}, 
            {"Data", each _, type table}),
        
        ProcessGroup = (formatGroup as record) =>
            let
                format = formatGroup[DateFormat],
                data = formatGroup[Data]
            in
                if format = "yyyy-MM-dd" then
                    Table.TransformColumns(data, {"DateString", DateTime.FromText})
                else if format = "MM/dd/yyyy" then
                    Table.TransformColumns(data, {"DateString", each DateTime.FromText(_, "en-US")})
                else
                    Table.AddColumn(data, "ParsedDate", each null)
    in
        Table.Combine(List.Transform(GroupedByFormat[Data], ProcessGroup))

Solution: Batch similar operations and pre-filter data to avoid expensive operations on rows that will obviously fail.

Troubleshooting Error Messages

"Expression.Error: We cannot convert the value X to type Y"

  • Check for hidden characters or unexpected formatting in your source data
  • Use Text.Trim() and examine the actual characters with Text.ToBinary()
  • Implement try-catch around the conversion to get more specific error information

"DataFormat.Error: Invalid format string"

  • Your format string doesn't match the actual data format
  • Use DateTime.FromText() without format specifiers first, then add specific formats if needed
  • Check locale-specific formatting requirements

"Expression.Error: The column 'X' of the table wasn't found"

  • Column names may have changed due to earlier transformations
  • Use Table.ColumnNames() to inspect actual column names
  • Be cautious with dynamic column operations that might remove columns conditionally

Summary & Next Steps

You've now mastered the advanced data cleaning capabilities that separate professional data engineers from casual Power Query users. You understand how Power Query's error propagation system works internally, how to implement sophisticated error handling strategies that preserve data lineage, and how to build fault-tolerant pipelines that degrade gracefully under adverse conditions.

The key insights from this deep dive:

  1. Error propagation is predictable: Understanding how errors flow through transformation chains allows you to place handlers strategically rather than defensively everywhere.

  2. Context matters for cleaning decisions: The same messy value might be an error in one business context and a valid null in another. Build this intelligence into your cleaning logic.

  3. Performance scales with strategy: Naive error handling approaches fail at enterprise scale. Pre-filtering, batching, and strategic type conversion placement are essential.

  4. Quality metadata is as important as clean data: Track what you've cleaned, how you've cleaned it, and what couldn't be cleaned. This metadata becomes crucial for downstream analysis and debugging.

Your next steps should focus on implementing these patterns in your specific domain contexts. Consider building reusable function libraries that encode your organization's data quality rules, and establish monitoring systems that alert when data quality degrades beyond acceptable thresholds.

For continued learning, explore Power Query's advanced M language features like custom connector development and integration with Azure Data Factory for enterprise-scale deployment scenarios. The error handling and type conversion patterns you've mastered here form the foundation for more sophisticated data engineering workflows.

Learning Path: Power Query Essentials

Previous

Combining Data from Multiple Sources with Append and Merge Queries in Power Query

Related Articles

Power Query⚡ Practitioner

Combining Data from Multiple Sources with Append and Merge Queries in Power Query

19 min
Power Query🔥 Expert

Power Query Transformations: Split, Merge, Pivot, and Unpivot - Advanced Techniques

17 min
Power Query⚡ Practitioner

Getting Started with Power Query: Master Connect, Transform, Load for Real-World Data

27 min

On this page

  • Prerequisites
  • Understanding Power Query's Error and Null System
  • The Error Propagation Chain
  • Null vs. Error: When to Use Each
  • Advanced Error Handling Strategies
  • The Try Expression: Controlled Error Conversion
  • Conditional Error Handling with Business Logic
  • Custom Error Types for Complex Scenarios
  • Mastering Type Conversions
  • Understanding Power Query's Type System
  • Handling Nulls at Scale
  • Null Propagation Strategies
  • Advanced Null Coalescing Patterns
  • Performance Optimization for Large Datasets
  • Efficient Error Handling at Scale
  • Memory-Efficient Type Conversions
  • Building Fault-Tolerant Data Pipelines
  • Implementing Circuit Breaker Patterns
  • Graceful Degradation Strategies
  • Hands-On Exercise
  • Common Mistakes & Troubleshooting
  • Mistake 1: Ignoring Error Propagation
  • Mistake 2: Inconsistent Null Handling
  • Mistake 3: Performance-Killing Error Handling
  • Troubleshooting Error Messages
  • Summary & Next Steps
  • Robust Text-to-Number Conversions
  • Date and DateTime Conversions with Timezone Awareness
  • Handling Nulls at Scale
  • Null Propagation Strategies
  • Advanced Null Coalescing Patterns
  • Performance Optimization for Large Datasets
  • Efficient Error Handling at Scale
  • Memory-Efficient Type Conversions
  • Building Fault-Tolerant Data Pipelines
  • Implementing Circuit Breaker Patterns
  • Graceful Degradation Strategies
  • Hands-On Exercise
  • Common Mistakes & Troubleshooting
  • Mistake 1: Ignoring Error Propagation
  • Mistake 2: Inconsistent Null Handling
  • Mistake 3: Performance-Killing Error Handling
  • Troubleshooting Error Messages
  • Summary & Next Steps