
You're staring at a dataset imported from multiple CSV files, an aging SQL database, and a REST API that returns inconsistent data types. Half the records show "NULL" as text, others show actual nulls, and some contain error values that crash your analysis downstream. Welcome to the reality of data work—where clean, well-structured data is the exception, not the rule.
Power Query's strength isn't just in connecting to data sources—it's in transforming messy, real-world data into something reliable. But handling nulls, errors, and type conversions requires understanding how Power Query's evaluation engine works under the hood, particularly its error propagation system and type coercion rules. Most practitioners learn a few basic transformations and call it good. We're going deeper.
By mastering these advanced data cleaning techniques, you'll build robust data pipelines that handle edge cases gracefully and fail predictably when they do encounter issues. More importantly, you'll understand when to be aggressive with error handling and when to preserve errors for downstream investigation.
What you'll learn:
You should be comfortable creating basic Power Query transformations, understand M language fundamentals (variables, let expressions, functions), and have experience with data types in analytical contexts. Familiarity with error handling concepts in other programming languages will be helpful but isn't required.
Before diving into cleaning techniques, you need to understand how Power Query handles errors and nulls at the engine level. This isn't academic—it directly impacts how your transformations behave in production.
Power Query uses a lazy evaluation model where errors propagate through transformation chains until they're either handled or reach a sink (like a table visualization). This means an error in one column can cascade through multiple derived columns, making debugging challenging.
Consider this scenario: you're importing customer data where the "CustomerID" column sometimes contains invalid values. Here's what happens internally:
let
Source = Table.FromRows({
{"001", "25", "2023-01-15"},
{"002", "invalid_age", "2023-01-16"},
{"003", "35", "2023-01-17"}
}, {"CustomerID", "Age", "SignupDate"}),
// This creates an error in row 2
AgeAsNumber = Table.TransformColumns(Source, {"Age", Int64.Type}),
// This propagates the error to dependent calculations
AgeCategory = Table.AddColumn(AgeAsNumber, "Category",
each if [Age] < 30 then "Young" else "Mature"),
// Error continues propagating
Premium = Table.AddColumn(AgeCategory, "IsPremium",
each [Age] > 25 and Text.Length([CustomerID]) = 3)
in
Premium
Row 2 now contains errors in Age, Category, and IsPremium columns. The error originated from a single invalid age value but contaminated multiple downstream columns. Understanding this propagation is crucial for deciding where to intervene.
Power Query distinguishes between null (missing data) and errors (invalid operations). This distinction matters for both performance and semantic correctness:
let
TestData = #table(
{"Value", "Operation"},
{
{null, "null + 5"},
{"invalid", "Text.ToNumber on invalid"}
}
),
// Null arithmetic returns null (propagates but doesn't fail)
NullResult = null + 5, // Returns null
// Invalid conversion returns error (fails unless handled)
ErrorResult = Text.ToNumber("invalid") // Returns [Expression.Error]
in
TestData
Choose nulls when data is missing but the record is otherwise valid. Choose errors when the data indicates a systemic problem that requires attention.
The try expression converts errors to records with metadata, allowing you to inspect and handle errors programmatically:
let
Source = Table.FromRows({
{"100", "2023-01-15"},
{"not_a_number", "2023-01-16"},
{"200", "invalid_date"},
{"300", "2023-01-17"}
}, {"Amount", "Date"}),
// Comprehensive error handling with metadata preservation
SafeConversions = Table.AddColumn(Source, "ProcessedAmount",
each
let
TryAmount = try Number.FromText([Amount]),
Result = if TryAmount[HasError] then
[
Value = null,
ErrorType = "ConversionError",
OriginalValue = [Amount],
ErrorDetail = TryAmount[Error][Message]
]
else
[
Value = TryAmount[Value],
ErrorType = null,
OriginalValue = [Amount],
ErrorDetail = null
]
in
Result),
// Expand the record for analysis
ExpandedResults = Table.ExpandRecordColumn(SafeConversions,
"ProcessedAmount",
{"Value", "ErrorType", "OriginalValue", "ErrorDetail"})
in
ExpandedResults
This pattern creates an audit trail of all conversion attempts, invaluable for debugging data quality issues in production systems.
Sometimes you need different error handling strategies based on business context. Here's a pattern for handling customer age data where different validation rules apply:
let
Source = #table(
{"CustomerID", "Age", "CustomerType"},
{
{"C001", "25", "Individual"},
{"C002", "-5", "Individual"},
{"C003", "150", "Individual"},
{"C004", null, "Corporate"},
{"C005", "unknown", "Individual"}
}
),
ProcessAge = Table.AddColumn(Source, "ValidatedAge",
each
let
CustomerType = [CustomerType],
RawAge = [Age]
in
if CustomerType = "Corporate" then
null // Age not applicable for corporate customers
else
let
TryAge = try Number.FromText(Text.From(RawAge))
in
if TryAge[HasError] then
null // Convert invalid ages to null
else
let
NumericAge = TryAge[Value]
in
if NumericAge < 0 or NumericAge > 120 then
error Error.Record("ValidationError",
"Age out of valid range: " & Text.From(NumericAge))
else
NumericAge
)
in
ProcessAge
This approach treats missing corporate ages as expected nulls but flags biologically impossible ages as errors requiring investigation.
For enterprise data pipelines, create custom error types that encode business meaning:
let
// Define custom error constructors
DataQualityError = (category as text, message as text, sourceValue as any) =>
error Error.Record("DataQuality." & category, message, sourceValue),
BusinessRuleError = (rule as text, message as text) =>
error Error.Record("BusinessRule." & rule, message),
// Example usage in transaction validation
Source = #table(
{"TransactionID", "Amount", "AccountType", "Timestamp"},
{
{"T001", "1000.00", "Checking", "2023-01-15T10:00:00"},
{"T002", "-50.00", "Savings", "2023-01-15T10:01:00"},
{"T003", "50000.00", "Checking", "2023-01-15T10:02:00"},
{"T004", "invalid", "Checking", "2023-01-15T10:03:00"}
}
),
ValidateTransactions = Table.AddColumn(Source, "ValidatedAmount",
each
let
TryAmount = try Number.FromText([Amount])
in
if TryAmount[HasError] then
DataQualityError("InvalidFormat",
"Cannot convert amount to number", [Amount])
else
let
Amount = TryAmount[Value],
AccountType = [AccountType]
in
if Amount < 0 and AccountType = "Savings" then
BusinessRuleError("NegativeBalance",
"Savings accounts cannot have negative transactions")
else if Amount > 10000 then
BusinessRuleError("LargeTransaction",
"Transactions over $10,000 require additional approval")
else
Amount
)
in
ValidateTransactions
These custom error types allow downstream error handlers to route different error categories to appropriate remediation workflows.
Power Query's type system is more nuanced than it appears. Types serve both as documentation and runtime constraints, but they behave differently in different contexts. Understanding these nuances prevents subtle bugs in production pipelines.
let
// Explicit type specification
TypedTable = #table(
type table [ID = Int64.Type, Name = Text.Type, Score = Number.Type],
{
{1, "Alice", 95.5},
{2, "Bob", 87.2}
}
),
// Type inference from data
InferredTable = #table(
{"ID", "Name", "Score"},
{
{1, "Alice", 95.5},
{2, "Bob", 87.2}
}
),
// Check the difference
TypedStructure = Value.Type(TypedTable),
InferredStructure = Value.Type(InferredTable)
in
{TypedStructure, InferredStructure}
The typed table enforces constraints at runtime, while the inferred table accepts any compatible values. This affects performance and error behavior downstream.
Text-to-number conversion is where most data pipelines break. Here's a comprehensive approach that handles real-world messiness:
let
Source = #table(
{"ProductID", "Price", "Currency"},
{
{"P001", "$1,234.56", "USD"},
{"P002", "€ 987,45", "EUR"},
{"P003", "1.234,56 kr", "NOK"},
{"P004", "invalid_price", "USD"},
{"P005", "", "USD"},
{"P006", "FREE", "USD"}
}
),
CleanAndConvertPrice = (priceText as text, currency as text) as any =>
let
// Handle special cases first
CleanText = Text.Trim(Text.Upper(priceText))
in
if CleanText = "" or CleanText = null then
null
else if CleanText = "FREE" or CleanText = "N/A" then
0
else
let
// Remove currency symbols and common formatting
Step1 = Text.Replace(priceText, "$", ""),
Step2 = Text.Replace(Step1, "€", ""),
Step3 = Text.Replace(Step2, "kr", ""),
Step4 = Text.Trim(Step3),
// Handle different decimal/thousand separators by currency
CleanedText =
if currency = "EUR" or currency = "NOK" then
// European format: 1.234,56
let
LastComma = Text.PositionOfAny(Step4, {","}, Occurrence.Last),
LastDot = Text.PositionOfAny(Step4, {"."}, Occurrence.Last)
in
if LastComma > LastDot then
// Comma is decimal separator
Text.Replace(Text.Replace(Step4, ".", ""), ",", ".")
else
// Dot is decimal separator
Text.Replace(Step4, ",", "")
else
// US format: 1,234.56
Text.Replace(Step4, ",", ""),
// Attempt conversion
TryConvert = try Number.FromText(CleanedText)
in
if TryConvert[HasError] then
error Error.Record("PriceConversion",
"Cannot convert '" & priceText & "' to number")
else
TryConvert[Value],
ProcessedPrices = Table.AddColumn(Source, "NumericPrice",
each CleanAndConvertPrice([Price], [Currency]))
in
ProcessedPrices
This approach handles multiple currency formats, special text values, and provides meaningful error messages for debugging.
Date parsing is notoriously fragile due to format variations and timezone issues. Here's a robust pattern:
let
Source = #table(
{"EventID", "Timestamp", "Timezone"},
{
{"E001", "2023-01-15 14:30:00", "UTC"},
{"E002", "15/01/2023 2:30 PM", "EST"},
{"E003", "2023-01-15T14:30:00Z", "UTC"},
{"E004", "Jan 15, 2023 14:30", "PST"},
{"E005", "invalid_date", "UTC"}
}
),
ParseTimestamp = (timestampText as text, timezone as text) as any =>
let
CleanText = Text.Trim(timestampText)
in
if CleanText = "" or CleanText = null then
null
else
let
// Try multiple parsing strategies
Strategies = {
() => DateTime.FromText(CleanText),
() => DateTime.FromText(CleanText, "en-US"),
() => DateTime.FromText(CleanText, "en-GB"),
() => DateTimeZone.FromText(CleanText)
},
TryStrategies = List.Transform(Strategies,
each try _()),
SuccessfulParse = List.First(
List.Select(TryStrategies, each not [HasError]),
[HasError = true, Value = null]
)
in
if SuccessfulParse[HasError] then
error Error.Record("DateParsing",
"Cannot parse date: " & CleanText)
else
let
ParsedDateTime = SuccessfulParse[Value],
// Apply timezone conversion if needed
AdjustedDateTime =
if timezone = "EST" then
ParsedDateTime + #duration(0, 5, 0, 0) // EST to UTC
else if timezone = "PST" then
ParsedDateTime + #duration(0, 8, 0, 0) // PST to UTC
else
ParsedDateTime
in
AdjustedDateTime,
ProcessedTimestamps = Table.AddColumn(Source, "ParsedTimestamp",
each ParseTimestamp([Timestamp], [Timezone]))
in
ProcessedTimestamps
This pattern tries multiple parsing strategies and handles basic timezone conversions, essential for multi-region data sources.
Different null handling strategies are appropriate for different analytical contexts. Understanding when to preserve nulls versus when to substitute values is crucial:
let
Source = #table(
{"CustomerID", "Revenue", "Employees", "Industry"},
{
{"C001", 1000000, 50, "Technology"},
{"C002", null, 25, "Healthcare"},
{"C003", 500000, null, "Technology"},
{"C004", null, null, "Manufacturing"}
}
),
// Strategy 1: Preserve nulls for aggregation-friendly operations
PreserveNulls = Source,
// Strategy 2: Contextual null replacement
ContextualReplacement = Table.TransformColumns(Source, {
{"Revenue", each _ ?? 0, type nullable number},
{"Employees", each
if [Industry] = "Technology" then _ ?? 10
else if [Industry] = "Healthcare" then _ ?? 15
else _ ?? 5, type nullable number}
}),
// Strategy 3: Null flagging for analysis
NullFlagging = Table.AddColumn(Source, "DataCompleteness",
each [
HasRevenue = [Revenue] <> null,
HasEmployees = [Employees] <> null,
CompletenessScore =
(if [Revenue] <> null then 1 else 0) +
(if [Employees] <> null then 1 else 0)
])
in
{PreserveNulls, ContextualReplacement, NullFlagging}
The preserve nulls strategy maintains data integrity for statistical operations. Contextual replacement uses business logic to infer reasonable defaults. Null flagging creates metadata for data quality analysis.
For complex scenarios, you need sophisticated null coalescing that considers multiple fallback sources:
let
PrimarySource = #table(
{"ID", "Name", "Email"},
{
{1, "Alice Johnson", null},
{2, null, "bob@email.com"},
{3, "Charlie Brown", "charlie@email.com"}
}
),
SecondarySource = #table(
{"ID", "Name", "Email"},
{
{1, "Alice J.", "alice@backup.com"},
{2, "Robert Smith", null},
{4, "Diana Prince", "diana@email.com"}
}
),
FallbackSource = #table(
{"ID", "DefaultName"},
{
{1, "Customer #1"},
{2, "Customer #2"},
{3, "Customer #3"},
{4, "Customer #4"}
}
),
// Intelligent coalescing with priority hierarchy
CoalesceWithPriority = (primary as any, secondary as any, fallback as any) as any =>
if primary <> null then primary
else if secondary <> null then secondary
else fallback,
// Join all sources
Step1 = Table.Join(PrimarySource, "ID", SecondarySource, "ID", JoinKind.FullOuter),
Step2 = Table.Join(Step1, "ID", FallbackSource, "ID", JoinKind.LeftOuter),
// Apply smart coalescing
FinalData = Table.AddColumn(Step2, "FinalName",
each CoalesceWithPriority([Name], [Name.1], [DefaultName])),
// Clean up intermediate columns
CleanedData = Table.SelectColumns(FinalData, {"ID", "FinalName", "Email"})
in
CleanedData
This pattern implements a priority-based fallback system, essential for data consolidation scenarios where you're merging multiple imperfect data sources.
When processing millions of rows, error handling performance becomes critical. Here are optimization strategies:
let
// Simulate large dataset
Source = Table.FromList(
List.Numbers(1, 1000000),
Splitter.SplitByNothing(),
{"ID"}
),
// Add some problematic data
WithProblematicData = Table.AddColumn(Source, "Data",
each if Number.Mod([ID], 10000) = 0 then "ERROR" else Text.From([ID] * 2)),
// SLOW: Individual try expressions
SlowErrorHandling = Table.AddColumn(WithProblematicData, "Slow_Converted",
each
let
TryResult = try Number.FromText([Data])
in
if TryResult[HasError] then null else TryResult[Value]
),
// FAST: Batch filtering and conversion
FastErrorHandling =
let
// First, identify rows that will convert successfully
ValidRows = Table.SelectRows(WithProblematicData,
each not Text.Contains([Data], "ERROR")),
ErrorRows = Table.SelectRows(WithProblematicData,
each Text.Contains([Data], "ERROR")),
// Convert valid rows in batch
ConvertedValid = Table.TransformColumns(ValidRows,
{"Data", Number.FromText, type number}),
// Add null column to error rows
ErrorRowsWithNull = Table.AddColumn(ErrorRows, "Data_Converted",
each null as nullable number),
// Combine results
Combined = Table.Combine({
Table.SelectColumns(ConvertedValid, {"ID", "Data_Converted"}),
Table.SelectColumns(ErrorRowsWithNull, {"ID", "Data_Converted"})
})
in
Table.Sort(Combined, {"ID"})
in
FastErrorHandling
The fast approach pre-filters rows to avoid expensive try operations on data that will obviously fail, improving performance by orders of magnitude on large datasets.
For very large datasets, memory usage during type conversion can become problematic:
let
// Memory-efficient approach using Table.Buffer strategically
Source = // Your large data source,
// Process in chunks to control memory usage
ChunkSize = 100000,
TotalRows = Table.RowCount(Source),
ChunkCount = Number.RoundUp(TotalRows / ChunkSize),
ProcessChunk = (startRow as number, chunkSize as number) =>
let
ChunkData = Table.Skip(Table.FirstN(Source, startRow + chunkSize), startRow),
ProcessedChunk = Table.TransformColumns(ChunkData, {
"StringColumn", each Text.Trim(_), type text,
"NumberColumn", each try Number.FromText(_) otherwise null, type nullable number
})
in
ProcessedChunk,
ProcessedChunks = List.Transform(
List.Numbers(0, ChunkCount),
each ProcessChunk(_ * ChunkSize, ChunkSize)
),
CombinedResult = Table.Combine(ProcessedChunks)
in
CombinedResult
This chunking approach prevents memory exhaustion when processing datasets that exceed available RAM.
For production data pipelines, you need mechanisms that fail gracefully when data quality degrades beyond acceptable thresholds:
let
Source = // Your data source,
// Define quality thresholds
MaxErrorRate = 0.05, // 5% error rate threshold
MinValidRows = 1000, // Minimum viable dataset size
// Process with quality monitoring
ProcessWithQualityCheck = (data as table) as record =>
let
TotalRows = Table.RowCount(data),
ProcessedData = Table.AddColumn(data, "ProcessedValue",
each
let
TryProcess = try YourProcessingFunction([SourceColumn])
in
[
Value = if TryProcess[HasError] then null else TryProcess[Value],
HasError = TryProcess[HasError],
ErrorMessage = if TryProcess[HasError] then TryProcess[Error][Message] else null
]
),
ExpandedData = Table.ExpandRecordColumn(ProcessedData, "ProcessedValue",
{"Value", "HasError", "ErrorMessage"}),
ErrorCount = Table.RowCount(Table.SelectRows(ExpandedData, each [HasError] = true)),
ValidCount = TotalRows - ErrorCount,
ErrorRate = if TotalRows = 0 then 1 else ErrorCount / TotalRows,
QualityAssessment = [
TotalRows = TotalRows,
ValidRows = ValidCount,
ErrorRows = ErrorCount,
ErrorRate = ErrorRate,
IsAcceptable = ErrorRate <= MaxErrorRate and ValidRows >= MinValidRows
]
in
[
Data = if QualityAssessment[IsAcceptable]
then ExpandedData
else error Error.Record("DataQuality",
"Data quality below threshold: " & Number.ToText(ErrorRate)),
Quality = QualityAssessment
],
Result = ProcessWithQualityCheck(Source)
in
Result
This circuit breaker pattern prevents low-quality data from propagating downstream and provides detailed quality metrics for monitoring.
Sometimes you need pipelines that continue operating with reduced functionality rather than failing completely:
let
Source = // Your data source,
// Define fallback processing levels
ProcessingLevel = (data as table) as record =>
let
// Level 1: Full processing (preferred)
TryFullProcessing = try FullDataProcessing(data),
FullResult = if not TryFullProcessing[HasError] then
[
Data = TryFullProcessing[Value],
Level = "Full",
Degraded = false
]
else
// Level 2: Essential processing only
let
TryEssentialProcessing = try EssentialDataProcessing(data)
in
if not TryEssentialProcessing[HasError] then
[
Data = TryEssentialProcessing[Value],
Level = "Essential",
Degraded = true
]
else
// Level 3: Minimal processing (last resort)
[
Data = MinimalDataProcessing(data),
Level = "Minimal",
Degraded = true
]
in
FullResult,
ProcessedResult = ProcessingLevel(Source),
// Add degradation metadata
FinalData = Table.AddColumn(ProcessedResult[Data], "ProcessingMetadata",
each [
ProcessingLevel = ProcessedResult[Level],
IsDegraded = ProcessedResult[Degraded],
ProcessedAt = DateTime.LocalNow()
])
in
FinalData
This graceful degradation ensures your pipeline continues operating even when optimal processing fails, though with reduced functionality.
Let's build a comprehensive data cleaning pipeline that handles a realistic scenario: processing customer survey data with multiple data quality issues.
You have survey responses from three different collection systems with inconsistent formats:
let
// Survey data from different sources with various quality issues
OnlineSurveys = #table(
{"ResponseID", "Age", "Income", "Satisfaction", "CollectionDate"},
{
{"ON001", "25", "$45,000", "Very Satisfied", "2023-01-15"},
{"ON002", "invalid_age", "50000", "4", "2023-01-16"},
{"ON003", "", "€55,000", "Satisfied", "bad_date"},
{"ON004", "35", "75k", "3", "2023-01-18"}
}
),
PhoneSurveys = #table(
{"ID", "Age", "AnnualIncome", "SatisfactionScore", "Date"},
{
{"PH001", 28, 48000, 4, #date(2023, 1, 20)},
{"PH002", null, null, 5, #date(2023, 1, 21)},
{"PH003", 150, -5000, 6, #date(2023, 1, 22)}
}
),
PaperSurveys = #table(
{"SurveyID", "RespondentAge", "Salary", "Rating", "SubmissionDate"},
{
{"PA001", "30-35", "40000-50000", "Good", "Jan 25, 2023"},
{"PA002", "25", "", "Excellent", "Jan 26, 2023"},
{"PA003", "unknown", "60000", "Poor", ""}
}
),
// Step 1: Standardize column names and create unified structure
StandardizeOnline = Table.RenameColumns(OnlineSurveys, {
{"ResponseID", "ID"},
{"Income", "AnnualIncome"},
{"Satisfaction", "SatisfactionText"},
{"CollectionDate", "Date"}
}),
StandardizePhone = Table.RenameColumns(PhoneSurveys, {
{"AnnualIncome", "AnnualIncome"},
{"SatisfactionScore", "SatisfactionNumeric"}
}),
StandardizePaper = Table.RenameColumns(PaperSurveys, {
{"SurveyID", "ID"},
{"RespondentAge", "Age"},
{"Salary", "AnnualIncome"},
{"Rating", "SatisfactionText"},
{"SubmissionDate", "Date"}
}),
// Step 2: Add source tracking and combine
OnlineWithSource = Table.AddColumn(StandardizeOnline, "Source", each "Online"),
PhoneWithSource = Table.AddColumn(StandardizePhone, "Source", each "Phone"),
PaperWithSource = Table.AddColumn(StandardizePaper, "Source", each "Paper"),
// Combine all sources
Combined = Table.Combine({OnlineWithSource, PhoneWithSource, PaperWithSource}),
// Step 3: Implement comprehensive cleaning functions
CleanAge = (ageValue as any, source as text) as any =>
let
ageText = Text.From(ageValue ?? "")
in
if ageText = "" or Text.Upper(ageText) = "UNKNOWN" then
null
else if Text.Contains(ageText, "-") then
// Handle age ranges by taking midpoint
let
parts = Text.Split(ageText, "-"),
minAge = try Number.FromText(parts{0}) otherwise null,
maxAge = try Number.FromText(parts{1}) otherwise null
in
if minAge = null or maxAge = null then
error Error.Record("AgeRange", "Invalid age range: " & ageText)
else
(minAge + maxAge) / 2
else
let
tryNumeric = try Number.FromText(ageText)
in
if tryNumeric[HasError] then
null
else
let
numericAge = tryNumeric[Value]
in
if numericAge < 13 or numericAge > 120 then
error Error.Record("InvalidAge",
"Age out of valid range: " & Text.From(numericAge))
else
numericAge,
CleanIncome = (incomeValue as any) as any =>
let
incomeText = Text.From(incomeValue ?? "")
in
if incomeText = "" then
null
else if Text.Contains(incomeText, "-") then
// Handle income ranges
let
cleanText = Text.Replace(Text.Replace(incomeText, "$", ""), ",", ""),
parts = Text.Split(cleanText, "-"),
minIncome = try Number.FromText(parts{0}) otherwise null,
maxIncome = try Number.FromText(parts{1}) otherwise null
in
if minIncome = null or maxIncome = null then
null
else
(minIncome + maxIncome) / 2
else
let
// Handle various currency and formatting symbols
step1 = Text.Replace(incomeText, "$", ""),
step2 = Text.Replace(step1, "€", ""),
step3 = Text.Replace(step2, ",", ""),
step4 = if Text.EndsWith(Text.Upper(step3), "K") then
Text.Replace(Text.Upper(step3), "K", "000")
else
step3,
tryConvert = try Number.FromText(step4)
in
if tryConvert[HasError] then
null
else
let
numericIncome = tryConvert[Value]
in
if numericIncome < 0 then
error Error.Record("NegativeIncome",
"Income cannot be negative: " & Text.From(numericIncome))
else if numericIncome > 10000000 then
error Error.Record("UnrealisticIncome",
"Income suspiciously high: " & Text.From(numericIncome))
else
numericIncome,
NormalizeSatisfaction = (textValue as any, numericValue as any) as any =>
if numericValue <> null then
let
score = Number.From(numericValue)
in
if score < 1 or score > 5 then
error Error.Record("InvalidSatisfaction",
"Satisfaction score must be 1-5: " & Text.From(score))
else
score
else if textValue <> null then
let
upperText = Text.Upper(Text.Trim(Text.From(textValue)))
in
if upperText = "EXCELLENT" or upperText = "VERY SATISFIED" then 5
else if upperText = "GOOD" or upperText = "SATISFIED" then 4
else if upperText = "FAIR" or upperText = "NEUTRAL" then 3
else if upperText = "POOR" or upperText = "DISSATISFIED" then 2
else if upperText = "VERY POOR" or upperText = "VERY DISSATISFIED" then 1
else
let
tryNumeric = try Number.FromText(upperText)
in
if tryNumeric[HasError] then
null
else
NormalizeSatisfaction(null, tryNumeric[Value])
else
null,
// Step 4: Apply cleaning with error tracking
CleanedData = Table.AddColumn(Combined, "CleaningResults",
each [
Age = try CleanAge([Age], [Source]) otherwise null,
Income = try CleanIncome([AnnualIncome]) otherwise null,
Satisfaction = try NormalizeSatisfaction([SatisfactionText], [SatisfactionNumeric]) otherwise null,
Errors =
let
ageError = try CleanAge([Age], [Source]),
incomeError = try CleanIncome([AnnualIncome]),
satError = try NormalizeSatisfaction([SatisfactionText], [SatisfactionNumeric]),
errorList = List.Select({
if ageError[HasError] then "Age: " & ageError[Error][Message] else null,
if incomeError[HasError] then "Income: " & incomeError[Error][Message] else null,
if satError[HasError] then "Satisfaction: " & satError[Error][Message] else null
}, each _ <> null)
in
if List.Count(errorList) = 0 then null else Text.Combine(errorList, "; ")
]),
// Step 5: Expand results and create final cleaned dataset
ExpandedResults = Table.ExpandRecordColumn(CleanedData, "CleaningResults",
{"Age", "Income", "Satisfaction", "Errors"},
{"CleanedAge", "CleanedIncome", "CleanedSatisfaction", "CleaningErrors"}),
// Step 6: Add data quality metrics
WithQualityMetrics = Table.AddColumn(ExpandedResults, "QualityScore",
each
let
scoreComponents = {
if [CleanedAge] <> null then 1 else 0,
if [CleanedIncome] <> null then 1 else 0,
if [CleanedSatisfaction] <> null then 1 else 0
},
totalScore = List.Sum(scoreComponents),
maxScore = List.Count(scoreComponents)
in
totalScore / maxScore),
// Step 7: Final column selection and organization
FinalDataset = Table.SelectColumns(WithQualityMetrics, {
"ID", "Source", "CleanedAge", "CleanedIncome",
"CleanedSatisfaction", "QualityScore", "CleaningErrors"
})
in
FinalDataset
This exercise demonstrates:
Problem: Applying transformations without considering how errors propagate through the pipeline.
// BAD: Error in Age conversion breaks all subsequent calculations
BadExample = Table.AddColumn(Source, "AgeCategory",
each if Number.From([Age]) < 30 then "Young" else "Mature")
// GOOD: Handle conversion errors explicitly
GoodExample = Table.AddColumn(Source, "AgeCategory",
each
let
tryAge = try Number.From([Age])
in
if tryAge[HasError] then
"Unknown"
else if tryAge[Value] < 30 then
"Young"
else
"Mature")
Solution: Always handle type conversions at the point of use, not as a separate step that could introduce errors downstream.
Problem: Mixing different null handling strategies within the same pipeline.
// BAD: Inconsistent null treatment
Step1 = Table.ReplaceValue(Source, null, 0, Replacer.ReplaceValue, {"Revenue"}),
Step2 = Table.AddColumn(Step1, "HasRevenue", each [Revenue] <> null) // Always false now!
// GOOD: Consistent null strategy
ProcessedTable = Table.AddColumn(Source, "RevenueInfo",
each [
Value = [Revenue] ?? 0,
HasOriginalValue = [Revenue] <> null,
IsImputed = [Revenue] = null
])
Solution: Decide on null handling strategy upfront and apply it consistently. Consider creating metadata columns to track imputation.
Problem: Using expensive error handling patterns on large datasets.
// BAD: Expensive for large datasets
SlowVersion = Table.AddColumn(LargeTable, "Converted",
each
let
allFormats = {"yyyy-MM-dd", "MM/dd/yyyy", "dd/MM/yyyy", "yyyy/MM/dd"},
tryFormats = List.Transform(allFormats,
(format) => try DateTime.FromText([DateString], [Culture = "en-US", Format = format]))
in
List.First(List.Select(tryFormats, each not [HasError]), [Value = null])[Value])
// GOOD: Pre-filter and batch process
FastVersion =
let
DetectFormat = (sample as text) =>
if Text.Contains(sample, "-") and Text.Length(sample) = 10 then "yyyy-MM-dd"
else if Text.Contains(sample, "/") then "MM/dd/yyyy"
else "unknown",
GroupedByFormat = Table.Group(LargeTable, {"DateFormat"},
{"Data", each _, type table}),
ProcessGroup = (formatGroup as record) =>
let
format = formatGroup[DateFormat],
data = formatGroup[Data]
in
if format = "yyyy-MM-dd" then
Table.TransformColumns(data, {"DateString", DateTime.FromText})
else if format = "MM/dd/yyyy" then
Table.TransformColumns(data, {"DateString", each DateTime.FromText(_, "en-US")})
else
Table.AddColumn(data, "ParsedDate", each null)
in
Table.Combine(List.Transform(GroupedByFormat[Data], ProcessGroup))
Solution: Batch similar operations and pre-filter data to avoid expensive operations on rows that will obviously fail.
"Expression.Error: We cannot convert the value X to type Y"
Text.Trim() and examine the actual characters with Text.ToBinary()"DataFormat.Error: Invalid format string"
DateTime.FromText() without format specifiers first, then add specific formats if needed"Expression.Error: The column 'X' of the table wasn't found"
Table.ColumnNames() to inspect actual column namesYou've now mastered the advanced data cleaning capabilities that separate professional data engineers from casual Power Query users. You understand how Power Query's error propagation system works internally, how to implement sophisticated error handling strategies that preserve data lineage, and how to build fault-tolerant pipelines that degrade gracefully under adverse conditions.
The key insights from this deep dive:
Error propagation is predictable: Understanding how errors flow through transformation chains allows you to place handlers strategically rather than defensively everywhere.
Context matters for cleaning decisions: The same messy value might be an error in one business context and a valid null in another. Build this intelligence into your cleaning logic.
Performance scales with strategy: Naive error handling approaches fail at enterprise scale. Pre-filtering, batching, and strategic type conversion placement are essential.
Quality metadata is as important as clean data: Track what you've cleaned, how you've cleaned it, and what couldn't be cleaned. This metadata becomes crucial for downstream analysis and debugging.
Your next steps should focus on implementing these patterns in your specific domain contexts. Consider building reusable function libraries that encode your organization's data quality rules, and establish monitoring systems that alert when data quality degrades beyond acceptable thresholds.
For continued learning, explore Power Query's advanced M language features like custom connector development and integration with Azure Data Factory for enterprise-scale deployment scenarios. The error handling and type conversion patterns you've mastered here form the foundation for more sophisticated data engineering workflows.
Learning Path: Power Query Essentials