
You're staring at a REST API endpoint that returns nested JSON with inconsistent schemas, or wrestling with an XML feed where namespaces seem to shift randomly between records. Sound familiar? If you've moved beyond basic CSV imports in Power Query, you've likely encountered the beautiful complexity of semi-structured data formats. JSON and XML dominate modern data interchange, but their flexibility creates unique challenges when transforming them into the tabular structures Power BI and Excel expect.
The M language provides sophisticated tools for parsing, navigating, and transforming both JSON and XML data sources. However, mastering these capabilities requires understanding not just the syntax, but the underlying data structures, performance implications, and the subtle ways that real-world data deviates from theoretical ideals. You'll need to handle deeply nested objects, mixed data types, optional fields, and the inevitable schema variations that plague production systems.
This lesson takes you deep into M's JSON and XML processing capabilities, covering everything from basic parsing to advanced transformation patterns. You'll learn to build resilient queries that handle schema variations gracefully, optimize performance for large datasets, and implement sophisticated extraction patterns that would be impossible in the Power Query UI.
What you'll learn:
This lesson assumes you have solid experience with M fundamentals, including record manipulation, list operations, and custom function creation. You should be comfortable reading intermediate M code and understand concepts like lazy evaluation and query folding. Basic familiarity with JSON and XML structure is helpful but not required.
Before diving into specific parsing techniques, it's crucial to understand how M conceptualizes JSON and XML data. Unlike languages that treat these formats as text to be parsed, M converts them directly into native data structures: records, lists, and primitive values for JSON; structured tables and records for XML.
This approach has profound implications for how you write transformation logic. Consider this JSON response from a customer API:
{
"customer_id": 12345,
"profile": {
"name": "Sarah Chen",
"email": "sarah.chen@example.com",
"preferences": {
"newsletter": true,
"notifications": ["email", "sms"]
}
},
"orders": [
{
"order_id": "ORD-001",
"date": "2024-01-15T10:30:00Z",
"items": [
{"product": "Widget A", "quantity": 2, "price": 29.99},
{"product": "Widget B", "quantity": 1, "price": 45.00}
]
}
]
}
When M parses this JSON using Json.Document(), it doesn't create a flat structure. Instead, it preserves the hierarchical relationships:
let
jsonText = Text.FromBinary(Web.Contents("https://api.example.com/customer/12345")),
parsed = Json.Document(jsonText),
// parsed is now a record with nested records and lists
customerId = parsed[customer_id], // Direct field access: 12345
customerName = parsed[profile][name], // Nested access: "Sarah Chen"
firstOrder = parsed[orders]{0}, // List indexing
orderItems = parsed[orders]{0}[items] // Returns a list of records
in
parsed
This native structure preservation is both powerful and challenging. It's powerful because you can navigate complex hierarchies naturally. It's challenging because you must think in terms of M's type system rather than string manipulation.
The key insight is that M treats semi-structured data as first-class data structures, not serialized text. This means your transformation logic should leverage record manipulation, list processing, and type conversion functions rather than string parsing.
Real-world JSON rarely follows perfect schemas. Fields appear and disappear, data types change, and nesting levels vary. M provides several techniques for handling this dynamism gracefully.
The most fundamental is understanding the difference between static field access (record[field]) and dynamic access (Record.Field(record, fieldName)). Static access is faster and more readable, but it fails if the field doesn't exist. Dynamic access allows you to handle optional fields programmatically:
let
// Sample data with inconsistent schema
apiResponse1 = Json.Document("{""id"": 1, ""name"": ""Product A"", ""category"": ""electronics""}"),
apiResponse2 = Json.Document("{""id"": 2, ""name"": ""Product B"", ""dept"": ""electronics""}"), // note: 'dept' instead of 'category'
// Function to safely extract fields with fallback logic
ExtractProductInfo = (response as record) as record =>
let
id = Record.FieldOrDefault(response, "id", null),
name = Record.FieldOrDefault(response, "name", "Unknown"),
// Handle the category/dept field variation
category = if Record.HasFields(response, "category") then
response[category]
else if Record.HasFields(response, "dept") then
response[dept]
else
"Uncategorized",
result = [
ID = id,
Name = name,
Category = category
]
in
result,
// Apply to both responses
product1 = ExtractProductInfo(apiResponse1),
product2 = ExtractProductInfo(apiResponse2)
in
{product1, product2}
This pattern becomes essential when working with APIs that evolve over time or aggregate data from multiple sources with slight schema differences.
Complex JSON structures often require recursive processing. Consider a nested organizational chart or a product catalog with unlimited category depth. M's recursive capabilities, combined with proper error handling, can elegantly flatten these structures:
let
// Sample nested organizational data
orgData = Json.Document("{
""name"": ""CEO"",
""employee_id"": ""001"",
""reports"": [
{
""name"": ""VP Sales"",
""employee_id"": ""002"",
""reports"": [
{""name"": ""Sales Rep 1"", ""employee_id"": ""003"", ""reports"": []},
{""name"": ""Sales Rep 2"", ""employee_id"": ""004"", ""reports"": []}
]
},
{
""name"": ""VP Engineering"",
""employee_id"": ""005"",
""reports"": [
{""name"": ""Developer 1"", ""employee_id"": ""006"", ""reports"": []},
{""name"": ""Developer 2"", ""employee_id"": ""007"", ""reports"": []}
]
}
]
}"),
// Recursive function to flatten the hierarchy
FlattenHierarchy = (node as record, optional parentId as text, optional level as number) as list =>
let
currentLevel = if level = null then 0 else level,
currentRecord = [
EmployeeId = node[employee_id],
Name = node[name],
ParentId = parentId,
Level = currentLevel
],
// Process subordinates recursively
reports = Record.FieldOrDefault(node, "reports", {}),
childRecords = if Value.Type(reports) = type list and List.Count(reports) > 0 then
List.Transform(reports, (report) =>
FlattenHierarchy(report, node[employee_id], currentLevel + 1)
)
else
{},
// Combine current record with all child records
allChildRecords = if Value.Type(childRecords) = type list then
List.Combine(childRecords)
else
{},
result = List.Combine({{currentRecord}, allChildRecords})
in
result,
flattenedData = FlattenHierarchy(orgData),
finalTable = Table.FromRecords(flattenedData)
in
finalTable
This recursive approach scales to arbitrary depths and maintains referential integrity through the parent-child relationships.
When processing large JSON datasets, especially from APIs with pagination or bulk exports, performance becomes critical. M's lazy evaluation helps, but you need to structure your queries to minimize memory usage and processing overhead.
The key principle is to avoid materializing large intermediate structures. Instead of loading entire JSON arrays into memory, process them in chunks or streams:
let
// Function to process JSON in streaming fashion
ProcessLargeJsonFile = (filePath as text) as table =>
let
// Read file as binary to control parsing
binaryContent = File.Contents(filePath),
// For very large files, consider reading line by line
// This example assumes a JSON Lines format (one JSON object per line)
textContent = Text.FromBinary(binaryContent),
lines = Text.Split(textContent, "#(cr)#(lf)"),
// Process each line as separate JSON
validLines = List.Select(lines, each Text.Length(_) > 0),
// Transform each JSON line - this is where lazy evaluation helps
transformedRecords = List.Transform(validLines, (line) =>
try
let
jsonRecord = Json.Document(line),
// Extract only the fields you need - don't materialize entire objects
extractedData = [
id = Record.FieldOrDefault(jsonRecord, "id", null),
timestamp = Record.FieldOrDefault(jsonRecord, "timestamp", null),
value = Record.FieldOrDefault(jsonRecord, "value", null)
// Avoid extracting nested objects unless necessary
]
in
extractedData
otherwise
null
),
// Filter out parsing errors
cleanRecords = List.Select(transformedRecords, each _ <> null),
// Convert to table only at the end
result = Table.FromRecords(cleanRecords)
in
result
in
ProcessLargeJsonFile
For APIs with pagination, implement a pattern that processes pages sequentially rather than accumulating all data in memory:
let
// Function to fetch paginated API data
FetchAllPages = (baseUrl as text, initialParams as record) as table =>
let
// Helper function to fetch a single page
FetchPage = (url as text, pageNumber as number) =>
let
pageUrl = url & "&page=" & Number.ToText(pageNumber),
response = try Json.Document(Web.Contents(pageUrl)) otherwise null,
hasData = response <> null and Record.HasFields(response, "data") and List.Count(response[data]) > 0
in
[Response = response, HasData = hasData],
// Recursive function to fetch all pages
FetchAllPagesImpl = (pageNum as number, accumulator as list) =>
let
pageResult = FetchPage(baseUrl, pageNum),
currentData = if pageResult[HasData] then pageResult[Response][data] else {},
newAccumulator = if Value.Type(currentData) = type list then
List.Combine({accumulator, currentData})
else
accumulator,
// Continue if we got data, stop otherwise
result = if pageResult[HasData] then
@FetchAllPagesImpl(pageNum + 1, newAccumulator)
else
newAccumulator
in
result,
allRecords = FetchAllPagesImpl(1, {}),
finalTable = Table.FromRecords(allRecords)
in
finalTable
in
FetchAllPages("https://api.example.com/data?limit=100", [])
Production JSON data rarely follows a single, consistent schema. You'll encounter optional fields, varying data types, and structural differences that can break rigid parsing logic. Building resilient transformations requires anticipating these variations and handling them gracefully.
Consider this pattern for handling schema evolution in a user analytics API:
let
// Sample data showing schema evolution over time
oldFormat = Json.Document("{
""user_id"": 12345,
""event"": ""page_view"",
""timestamp"": ""2023-01-15T10:30:00Z"",
""page"": ""/products""
}"),
newFormat = Json.Document("{
""user_id"": 67890,
""event_type"": ""page_view"",
""occurred_at"": ""2024-01-15T10:30:00Z"",
""properties"": {
""page_path"": ""/products"",
""referrer"": ""https://google.com"",
""session_id"": ""sess_123""
}
}"),
// Adaptive transformation function
NormalizeEvent = (eventRecord as record) as record =>
let
// Determine schema version based on field presence
isNewFormat = Record.HasFields(eventRecord, "event_type"),
// Extract user ID (consistent across versions)
userId = eventRecord[user_id],
// Handle event name field variations
eventName = if isNewFormat then
eventRecord[event_type]
else
Record.FieldOrDefault(eventRecord, "event", "unknown"),
// Handle timestamp field variations
timestamp = if isNewFormat then
Record.FieldOrDefault(eventRecord, "occurred_at", null)
else
Record.FieldOrDefault(eventRecord, "timestamp", null),
// Handle page information
pagePath = if isNewFormat then
try eventRecord[properties][page_path] otherwise null
else
Record.FieldOrDefault(eventRecord, "page", null),
// Extract new format fields with defaults
referrer = if isNewFormat then
try eventRecord[properties][referrer] otherwise null
else
null,
sessionId = if isNewFormat then
try eventRecord[properties][session_id] otherwise null
else
null,
// Construct normalized record
normalizedEvent = [
UserId = userId,
EventName = eventName,
Timestamp = timestamp,
PagePath = pagePath,
Referrer = referrer,
SessionId = sessionId,
SchemaVersion = if isNewFormat then "v2" else "v1"
]
in
normalizedEvent,
// Apply normalization to both formats
normalizedOld = NormalizeEvent(oldFormat),
normalizedNew = NormalizeEvent(newFormat),
// Combine into a table
combinedData = Table.FromRecords({normalizedOld, normalizedNew})
in
combinedData
This adaptive approach allows you to process data with multiple schema versions in a single query, maintaining backward compatibility while handling new fields appropriately.
XML processing in M differs significantly from JSON handling due to XML's more complex structure, including namespaces, attributes, and mixed content models. Understanding these differences is crucial for effective XML transformation.
XML namespaces are perhaps the most common source of confusion when processing XML in M. Unlike JSON, where field names are simple strings, XML elements can be qualified with namespace prefixes that affect how M interprets the structure.
Consider this XML response from a financial data service:
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:fin="http://example.com/financial/v2">
<soap:Header>
<fin:RequestId>12345</fin:RequestId>
</soap:Header>
<soap:Body>
<fin:StockQuotes>
<fin:Quote symbol="AAPL">
<fin:Price currency="USD">150.25</fin:Price>
<fin:Volume>1000000</fin:Volume>
<fin:LastUpdated>2024-01-15T15:30:00Z</fin:LastUpdated>
</fin:Quote>
<fin:Quote symbol="MSFT">
<fin:Price currency="USD">250.75</fin:Price>
<fin:Volume>800000</fin:Volume>
<fin:LastUpdated>2024-01-15T15:30:00Z</fin:LastUpdated>
</fin:Quote>
</fin:StockQuotes>
</soap:Body>
</soap:Envelope>
When M parses this XML using Xml.Document(), it preserves namespace information, but accessing elements requires understanding how namespaces translate to M's record structure:
let
// Sample XML content (in practice, this would come from Web.Contents or File.Contents)
xmlContent = "<soap:Envelope xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:fin=""http://example.com/financial/v2"">
<soap:Body>
<fin:StockQuotes>
<fin:Quote symbol=""AAPL"">
<fin:Price currency=""USD"">150.25</fin:Price>
<fin:Volume>1000000</fin:Volume>
</fin:Quote>
</fin:StockQuotes>
</soap:Body>
</soap:Envelope>",
// Parse XML
parsedXml = Xml.Document(Text.ToBinary(xmlContent)),
// Navigate through namespaced elements
// M uses the full namespace URI, not the prefix
soapBody = parsedXml[#"http://schemas.xmlsoap.org/soap/envelope/:Body"],
stockQuotes = soapBody[#"http://example.com/financial/v2:StockQuotes"],
quotes = stockQuotes[#"http://example.com/financial/v2:Quote"],
// Extract data from each quote
ExtractQuoteData = (quote as record) =>
let
// Attributes are accessed through the #"@attribute" syntax
symbol = quote[#"@symbol"],
// Element values are accessed by their full namespace URI
price = quote[#"http://example.com/financial/v2:Price"][#"#text"],
priceCurrency = quote[#"http://example.com/financial/v2:Price"][#"@currency"],
volume = Number.From(quote[#"http://example.com/financial/v2:Volume"][#"#text"]),
result = [
Symbol = symbol,
Price = Number.From(price),
Currency = priceCurrency,
Volume = volume
]
in
result,
// Process all quotes
quotesData = if Value.Type(quotes) = type list then
List.Transform(quotes, ExtractQuoteData)
else
{ExtractQuoteData(quotes)}, // Handle single quote case
finalTable = Table.FromRecords(quotesData)
in
finalTable
The verbose namespace syntax makes XML processing more complex than JSON, but it's essential for correctly handling real-world XML documents that use multiple namespaces.
Complex XML documents often contain mixed content, nested structures, and varying element patterns that require sophisticated transformation logic. Here's how to handle common XML processing scenarios:
Processing Mixed Content and CDATA Sections:
let
// XML with mixed content and CDATA
xmlWithMixedContent = "<articles>
<article id=""1"">
<title>Data Processing</title>
<content><![CDATA[This article discusses <em>advanced</em> techniques for processing data.]]></content>
<metadata>
<author>John Doe</author>
<tags>
<tag>data</tag>
<tag>processing</tag>
<tag>analytics</tag>
</tags>
</metadata>
</article>
</articles>",
parsedXml = Xml.Document(Text.ToBinary(xmlWithMixedContent)),
articles = parsedXml[articles][article],
ProcessArticle = (article as record) =>
let
articleId = article[#"@id"],
title = article[title][#"#text"],
// CDATA content is accessible through #text
content = article[content][#"#text"],
// Extract author
author = article[metadata][author][#"#text"],
// Handle multiple tags - XML lists can be tricky
tagsElement = article[metadata][tags][tag],
tags = if Value.Type(tagsElement) = type list then
List.Transform(tagsElement, (tagElement) => tagElement[#"#text"])
else
{tagsElement[#"#text"]}, // Single tag case
result = [
ID = articleId,
Title = title,
Content = content,
Author = author,
Tags = Text.Combine(tags, "; ")
]
in
result,
processedArticles = if Value.Type(articles) = type list then
List.Transform(articles, ProcessArticle)
else
{ProcessArticle(articles)},
finalTable = Table.FromRecords(processedArticles)
in
finalTable
Handling Hierarchical XML with Recursive Processing:
let
// Sample hierarchical XML (product catalog)
catalogXml = "<catalog>
<category name=""Electronics"" id=""1"">
<product id=""101"" name=""Laptop"" price=""999.99"" />
<product id=""102"" name=""Mouse"" price=""29.99"" />
<category name=""Components"" id=""11"">
<product id=""201"" name=""RAM"" price=""89.99"" />
<product id=""202"" name=""SSD"" price=""149.99"" />
</category>
</category>
<category name=""Books"" id=""2"">
<product id=""301"" name=""Data Science Handbook"" price=""49.99"" />
</category>
</catalog>",
parsedXml = Xml.Document(Text.ToBinary(catalogXml)),
rootCategories = parsedXml[catalog][category],
// Recursive function to process categories and their products
ProcessCategory = (categoryElement as record, optional parentPath as text) as list =>
let
categoryName = categoryElement[#"@name"],
categoryId = categoryElement[#"@id"],
currentPath = if parentPath = null then categoryName else parentPath & " > " & categoryName,
// Extract products in this category
products = Record.FieldOrDefault(categoryElement, "product", {}),
productRecords = if Value.Type(products) = type list then
List.Transform(products, (product) => [
CategoryPath = currentPath,
CategoryId = categoryId,
ProductId = product[#"@id"],
ProductName = product[#"@name"],
Price = Number.From(product[#"@price"])
])
else if Value.Type(products) = type record then
{[
CategoryPath = currentPath,
CategoryId = categoryId,
ProductId = products[#"@id"],
ProductName = products[#"@name"],
Price = Number.From(products[#"@price"])
]}
else
{},
// Process subcategories recursively
subcategories = Record.FieldOrDefault(categoryElement, "category", {}),
subcategoryRecords = if Value.Type(subcategories) = type list then
List.Combine(List.Transform(subcategories, (subcat) =>
@ProcessCategory(subcat, currentPath)
))
else if Value.Type(subcategories) = type record then
@ProcessCategory(subcategories, currentPath)
else
{},
// Combine products and subcategory results
allRecords = List.Combine({productRecords, subcategoryRecords})
in
allRecords,
// Process all root categories
allProducts = if Value.Type(rootCategories) = type list then
List.Combine(List.Transform(rootCategories, (cat) => ProcessCategory(cat)))
else
ProcessCategory(rootCategories),
finalTable = Table.FromRecords(allProducts)
in
finalTable
XML processing can be memory-intensive, especially with large documents. Unlike JSON, which M can often process in streaming fashion, XML's hierarchical nature typically requires loading the entire document structure. However, you can optimize performance through strategic parsing and selective data extraction:
let
// Function for processing large XML files efficiently
ProcessLargeXmlFile = (filePath as text) as table =>
let
// Read file in binary format for better control
binaryContent = File.Contents(filePath),
// For very large XML files, consider extracting specific sections
// This example shows how to process in chunks if the XML has a repeating structure
// Parse the full XML (unavoidable for proper XML processing)
xmlDocument = Xml.Document(binaryContent),
// Extract only the data elements you need - avoid materializing the entire tree
dataElements = xmlDocument[root][data],
// Process elements with minimal intermediate structures
ProcessElement = (element as record) =>
let
// Extract only required fields to minimize memory usage
id = try element[#"@id"] otherwise null,
name = try element[name][#"#text"] otherwise null,
value = try Number.From(element[value][#"#text"]) otherwise null,
// Don't extract complex nested structures unless necessary
result = [ID = id, Name = name, Value = value]
in
result,
// Transform elements efficiently
processedData = if Value.Type(dataElements) = type list then
List.Transform(dataElements, ProcessElement)
else
{ProcessElement(dataElements)},
// Convert to table only at the end
finalTable = Table.FromRecords(processedData)
in
finalTable
in
ProcessLargeXmlFile
For extremely large XML files, consider preprocessing them outside of Power Query (using tools like Saxon or xmlstarlet) to extract relevant sections before importing into M.
The true power of JSON and XML processing in M emerges when you integrate these semi-structured data sources with relational data and build comprehensive data transformation pipelines.
A common pattern involves enriching database records with data from JSON APIs. This scenario requires handling potential network failures, rate limiting, and data consistency:
let
// Base customer data from database
customerTable = Table.FromRecords({
[CustomerID = 1001, Name = "Acme Corp", Industry = "Manufacturing"],
[CustomerID = 1002, Name = "TechStart Inc", Industry = "Technology"],
[CustomerID = 1003, Name = "Global Retail", Industry = "Retail"]
}),
// Function to enrich customer data with API information
EnrichWithApiData = (customerId as number) as record =>
let
// Construct API URL
apiUrl = "https://api.example.com/customers/" & Number.ToText(customerId),
// Make API call with error handling
apiResponse = try
Json.Document(Web.Contents(apiUrl))
otherwise
[error = "API_UNAVAILABLE"],
// Extract additional data if API call succeeded
enrichmentData = if not Record.HasFields(apiResponse, "error") then
[
CreditScore = Record.FieldOrDefault(apiResponse, "credit_score", null),
LastOrderDate = Record.FieldOrDefault(apiResponse, "last_order_date", null),
TotalOrders = Record.FieldOrDefault(apiResponse, "total_orders", null),
PreferredContact = try apiResponse[preferences][contact_method] otherwise null
]
else
[
CreditScore = null,
LastOrderDate = null,
TotalOrders = null,
PreferredContact = null
]
in
enrichmentData,
// Add enrichment data to customer table
enrichedTable = Table.AddColumn(
customerTable,
"ApiData",
(row) => EnrichWithApiData(row[CustomerID]),
type record
),
// Expand the enrichment data into separate columns
expandedTable = Table.ExpandRecordColumn(
enrichedTable,
"ApiData",
{"CreditScore", "LastOrderDate", "TotalOrders", "PreferredContact"},
{"CreditScore", "LastOrderDate", "TotalOrders", "PreferredContact"}
)
in
expandedTable
As your JSON and XML processing requirements grow, building libraries of reusable functions becomes essential. These functions should handle common patterns, error scenarios, and performance optimizations:
let
// Library of reusable JSON/XML processing functions
// Generic function for safe JSON field extraction
JsonFieldExtractor = (jsonRecord as record, fieldPath as list, optional defaultValue as any) as any =>
let
ExtractRecursive = (currentRecord as record, remainingPath as list) =>
if List.Count(remainingPath) = 0 then
currentRecord
else
let
fieldName = remainingPath{0},
restOfPath = List.RemoveFirstN(remainingPath, 1),
fieldValue = Record.FieldOrDefault(currentRecord, fieldName, null)
in
if fieldValue = null then
null
else if List.Count(restOfPath) = 0 then
fieldValue
else if Value.Type(fieldValue) = type record then
@ExtractRecursive(fieldValue, restOfPath)
else
null,
result = try ExtractRecursive(jsonRecord, fieldPath) otherwise null
in
if result = null and defaultValue <> null then defaultValue else result,
// Function for handling JSON arrays with schema validation
ProcessJsonArray = (jsonArray as list, schemaValidator as function, optional errorHandler as function) as list =>
let
defaultErrorHandler = (item, error) => [OriginalItem = item, Error = error, IsValid = false],
actualErrorHandler = if errorHandler = null then defaultErrorHandler else errorHandler,
processItem = (item) =>
try
let
validationResult = schemaValidator(item),
result = if validationResult then item else actualErrorHandler(item, "Schema validation failed")
in
result
otherwise
actualErrorHandler(item, "Processing error"),
processedItems = List.Transform(jsonArray, processItem)
in
processedItems,
// XML namespace resolver for cleaner element access
XmlNamespaceResolver = (namespaces as record) =>
let
CreateAccessor = (prefix as text, uri as text) =>
(elementName as text) => uri & ":" & elementName,
accessors = Record.TransformFields(namespaces, {"", CreateAccessor})
in
accessors,
// Example usage of the library functions
sampleJsonData = Json.Document("{
""user"": {
""profile"": {
""name"": ""John Doe"",
""age"": 30
},
""orders"": [
{""id"": 1, ""total"": 99.99},
{""id"": 2, ""total"": 149.50}
]
}
}"),
// Extract nested field safely
userName = JsonFieldExtractor(sampleJsonData, {"user", "profile", "name"}, "Unknown User"),
// Process orders array with validation
orderValidator = (order) => Record.HasFields(order, {"id", "total"}),
validOrders = ProcessJsonArray(sampleJsonData[user][orders], orderValidator),
// Demonstrate namespace resolver for XML
xmlNamespaces = [
soap = "http://schemas.xmlsoap.org/soap/envelope/",
fin = "http://example.com/financial/v2"
],
nsResolver = XmlNamespaceResolver(xmlNamespaces),
soapBodyElement = nsResolver[soap]("Body") // Returns "http://schemas.xmlsoap.org/soap/envelope/:Body"
in
[
UserName = userName,
ValidOrders = validOrders,
SoapBodyElement = soapBodyElement
]
Robust JSON and XML processing requires comprehensive error handling that goes beyond simple try-catch blocks. You need to handle network issues, malformed data, schema changes, and partial failures gracefully:
let
// Comprehensive error handling framework
// Error categorization and reporting
CreateErrorReport = (source as text, operation as text, error as any, optional context as record) as record =>
let
errorInfo = [
Source = source,
Operation = operation,
ErrorType = if Value.Type(error) = type text then "String Error" else "System Error",
ErrorMessage = if Value.Type(error) = type text then error else "Unknown system error",
Timestamp = DateTime.LocalNow(),
Context = if context = null then [] else context
]
in
errorInfo,
// Retry mechanism for API calls
ApiCallWithRetry = (url as text, optional maxRetries as number, optional delaySeconds as number) as any =>
let
actualMaxRetries = if maxRetries = null then 3 else maxRetries,
actualDelay = if delaySeconds = null then 1 else delaySeconds,
MakeCall = (attempt as number) =>
try
Json.Document(Web.Contents(url))
otherwise
if attempt < actualMaxRetries then
Function.InvokeAfter(() => @MakeCall(attempt + 1), #duration(0, 0, 0, actualDelay))
else
CreateErrorReport("API", "GET", "Max retries exceeded", [URL = url, Attempts = attempt])
in
MakeCall(1),
// Data quality validation for JSON structures
ValidateJsonStructure = (jsonData as any, expectedSchema as record) as record =>
let
ValidateField = (fieldName as text, expectedType as type, isRequired as logical) =>
let
hasField = if Value.Type(jsonData) = type record then
Record.HasFields(jsonData, fieldName)
else false,
fieldValue = if hasField then Record.Field(jsonData, fieldName) else null,
isCorrectType = if fieldValue = null then not isRequired
else Value.Is(fieldValue, expectedType),
result = [
FieldName = fieldName,
IsPresent = hasField,
IsRequired = isRequired,
ExpectedType = Type.ToText(expectedType),
ActualType = if fieldValue = null then "null" else Type.ToText(Value.Type(fieldValue)),
IsValid = (not isRequired and fieldValue = null) or (hasField and isCorrectType)
]
in
result,
// Validate all expected fields
fieldValidations = List.Transform(
Record.FieldNames(expectedSchema),
(fieldName) =>
let
schemaInfo = Record.Field(expectedSchema, fieldName),
expectedType = schemaInfo[Type],
isRequired = Record.FieldOrDefault(schemaInfo, "Required", true)
in
ValidateField(fieldName, expectedType, isRequired)
),
// Calculate overall validity
allValid = List.AllTrue(List.Transform(fieldValidations, (validation) => validation[IsValid])),
validationReport = [
IsValid = allValid,
FieldValidations = fieldValidations,
ValidationTimestamp = DateTime.LocalNow()
]
in
validationReport,
// Example usage with comprehensive error handling
ProcessApiDataSafely = (apiUrl as text) as record =>
let
// Define expected schema
expectedSchema = [
id = [Type = type number, Required = true],
name = [Type = type text, Required = true],
email = [Type = type text, Required = false]
],
// Attempt API call with retries
apiResult = ApiCallWithRetry(apiUrl, 3, 2),
// Check if API call succeeded
hasError = Record.HasFields(apiResult, "Source"),
processingResult = if hasError then
[
Success = false,
Error = apiResult,
Data = null,
ValidationReport = null
]
else
let
// Validate data structure
validation = ValidateJsonStructure(apiResult, expectedSchema),
processedData = if validation[IsValid] then
[
ID = apiResult[id],
Name = apiResult[name],
Email = Record.FieldOrDefault(apiResult, "email", "not provided")
]
else
null
in
[
Success = validation[IsValid],
Error = if validation[IsValid] then null else validation,
Data = processedData,
ValidationReport = validation
]
in
processingResult
in
ProcessApiDataSafely
Let's build a comprehensive data integration solution that combines JSON and XML sources to create a unified customer analytics dataset. This exercise demonstrates real-world complexity and best practices.
Scenario: You're building a customer analytics dashboard that needs to combine:
let
// Step 1: Define reusable helper functions
SafeJsonExtract = (record as record, fieldPath as list, optional defaultValue as any) =>
let
ExtractNested = (current as record, path as list) =>
if List.Count(path) = 0 then current
else if not Record.HasFields(current, path{0}) then null
else
let nextValue = current[path{0}]
in if List.Count(path) = 1 then nextValue
else if Value.Type(nextValue) = type record then @ExtractNested(nextValue, List.RemoveFirstN(path, 1))
else null
in
try ExtractNested(record, fieldPath) otherwise (if defaultValue <> null then defaultValue else null),
// Step 2: Load and process customer data from JSON API
CustomerApiData = () =>
let
// Simulated API response (in real scenario, use Web.Contents)
apiResponse = Json.Document("{
""customers"": [
{
""customer_id"": 1001,
""profile"": {
""first_name"": ""Sarah"",
""last_name"": ""Johnson"",
""email"": ""sarah.johnson@email.com"",
""registration_date"": ""2023-01-15T10:30:00Z""
},
""account_status"": ""active"",
""lifetime_value"": 2500.75
},
{
""customer_id"": 1002,
""profile"": {
""first_name"": ""Michael"",
""last_name"": ""Chen"",
""email"": ""m.chen@company.com"",
""registration_date"": ""2023-03-20T14:15:00Z""
},
""account_status"": ""premium"",
""lifetime_value"": 4200.50
}
]
}"),
customers = apiResponse[customers],
ProcessCustomer = (customer) =>
[
CustomerID = customer[customer_id],
FirstName = SafeJsonExtract(customer, {"profile", "first_name"}, "Unknown"),
LastName = SafeJsonExtract(customer, {"profile", "last_name"}, "Unknown"),
Email = SafeJsonExtract(customer, {"profile", "email"}, ""),
RegistrationDate = SafeJsonExtract(customer, {"profile", "registration_date"}, null),
AccountStatus = customer[account_status],
LifetimeValue = customer[lifetime_value]
],
processedCustomers = List.Transform(customers, ProcessCustomer),
customerTable = Table.FromRecords(processedCustomers)
in
customerTable,
// Step 3: Load and process transaction data from XML
TransactionXmlData = () =>
let
// Simulated XML response
xmlContent = "<transactions xmlns:tx=""http://example.com/transactions"">
<tx:transaction id=""T001"" customer_id=""1001"">
<tx:date>2024-01-10T09:30:00Z</tx:date>
<tx:amount currency=""USD"">125.50</tx:amount>
<tx:products>
<tx:product id=""P001"" quantity=""2"" />
<tx:product id=""P002"" quantity=""1"" />
</tx:products>
</tx:transaction>
<tx:transaction id=""T002"" customer_id=""1002"">
<tx:date>2024-01-12T14:45:00Z</tx:date>
<tx:amount currency=""USD"">89.99</tx:amount>
<tx:products>
<tx:product id=""P003"" quantity=""1"" />
</tx:products>
</tx:transaction>
</transactions>",
xmlDoc = Xml.Document(Text.ToBinary(xmlContent)),
transactions = xmlDoc[#"http://example.com/transactions:transactions"][#"http://example.com/transactions:transaction"],
ProcessTransaction = (transaction) =>
let
transactionId = transaction[#"@id"],
customerId = Number.From(transaction[#"@customer_id"]),
date = transaction[#"http://example.com/transactions:date"][#"#text"],
amount = Number.From(transaction[#"http://example.com/transactions:amount"][#"#text"]),
currency = transaction[#"http://example.com/transactions:amount"][#"@currency"],
// Handle products - could be single or multiple
productsElement = transaction[#"http://example.com/transactions:products"][#"http://example.com/transactions:product"],
products = if Value.Type(productsElement) = type list then
List.Transform(productsElement, (product) => [
ProductID = product[#"@id"],
Quantity = Number.From(product[#"@quantity"])
])
else
{[
ProductID = productsElement[#"@id"],
Quantity = Number.From(productsElement[#"@quantity"])
]}
in
[
TransactionID = transactionId,
CustomerID = customerId,
TransactionDate = date,
Amount = amount,
Currency = currency,
Products = products
],
processedTransactions = if Value.Type(transactions) = type list then
List.Transform(transactions, ProcessTransaction)
else
{ProcessTransaction(transactions)},
transactionTable = Table.FromRecords(processedTransactions)
in
transactionTable,
// Step 4: Load customer preferences from JSON
CustomerPreferences = () =>
let
preferencesJson = Json.Document("{
""preferences"": {
""1001"": {
""communication"": {
""email_marketing"": true,
""sms_notifications"": false,
""preferred_language"": ""en""
},
""shopping"": {
""categories"": [""electronics"", ""books""],
""price_alerts"": true
}
},
""1002"": {
""communication"": {
""email_marketing"": false,
""sms_notifications"": true,
""preferred_language"": ""es""
},
""shopping"": {
""categories"": [""clothing"", ""home""],
""price_alerts"": false
}
}
}
}"),
preferencesRecord = preferencesJson[preferences],
customerIds = Record.FieldNames(preferencesRecord),
ProcessPreferences = (customerId) =>
let
customerPrefs = Record.Field(preferencesRecord, customerId),
result = [
CustomerID = Number.From(customerId),
EmailMarketing = SafeJsonExtract(customerPrefs, {"communication", "email_marketing"}, false),
SmsNotifications = SafeJsonExtract(customerPrefs, {"communication", "sms_notifications"}, false),
PreferredLanguage = SafeJsonExtract(customerPrefs, {"communication", "preferred_language"}, "en"),
PreferredCategories = try Text.Combine(customerPrefs[shopping][categories], "; ") otherwise "",
PriceAlerts = SafeJsonExtract(customerPrefs, {"shopping", "price_alerts"}, false)
]
in
result,
preferencesTable = Table.FromRecords(List.Transform(customerIds, ProcessPreferences))
in
preferencesTable,
// Step 5: Combine all data sources
customers = CustomerApiData(),
transactions = TransactionXmlData(),
preferences = CustomerPreferences(),
// Join customers with preferences
customersWithPrefs = Table.NestedJoin(
customers,
{"CustomerID"},
preferences,
{"CustomerID"},
"Preferences",
JoinKind.LeftOuter
),
expandedCustomers = Table.ExpandTableColumn(
customersWithPrefs,
"Preferences",
{"EmailMarketing", "SmsNotifications", "PreferredLanguage", "PreferredCategories", "PriceAlerts"},
{"EmailMarketing", "SmsNotifications", "PreferredLanguage", "PreferredCategories", "PriceAlerts"}
),
// Add transaction summary to each customer
transactionSummary = Table.Group(
transactions,
{"CustomerID"},
{
{"TotalTransactions", each Table.RowCount(_), Int64.Type},
{"TotalSpent", each List.Sum([Amount]), type number},
{"LastTransactionDate", each List.Max([TransactionDate]), type text}
}
),
finalCustomerData = Table.NestedJoin(
expandedCustomers,
{"CustomerID"},
transactionSummary,
{"CustomerID"},
"TransactionSummary",
JoinKind.LeftOuter
),
finalTable = Table.ExpandTableColumn(
finalCustomerData,
"TransactionSummary",
{"TotalTransactions", "TotalSpent", "LastTransactionDate"},
{"TotalTransactions", "TotalSpent", "LastTransactionDate"}
)
in
finalTable
This exercise demonstrates several key concepts:
Working with JSON and XML in M presents unique challenges. Here are the most common pitfalls and their solutions:
1. Namespace Confusion in XML Processing
Problem: Many developers struggle with XML namespace syntax, leading to queries that work with sample data but fail in production.
Solution: Always use the full namespace URI, not the prefix. Create a namespace mapping function for cleaner code:
// Wrong approach - using prefix
element[soap:Body] // This will fail
// Correct approach - using full URI
element[#"http://schemas.xmlsoap.org/soap/envelope/:Body"]
// Better approach - namespace helper
let
ns = [
soap = (name) => "http://schemas.xmlsoap.org/soap/envelope/:" & name,
custom = (name) => "http://example.com/custom:" & name
]
in
element[ns[soap]("Body")][ns[custom]("Data")]
2. Assuming Consistent JSON Structure
Problem: Queries break when APIs return slightly different schemas or when optional fields are missing.
Solution: Always use defensive programming patterns:
// Fragile approach
customerName = apiResponse[customer][profile][name]
// Robust approach
customerName = try apiResponse[customer][profile][name] otherwise
try apiResponse[customer][name] otherwise
"Unknown Customer"
// Even better - use a helper function
SafeExtract = (record, path, default) => /* implementation shown earlier */
customerName = SafeExtract(apiResponse, {"customer", "profile", "name"}, "Unknown")
3. Memory Issues with Large Documents
Problem: Processing large JSON or XML files causes memory errors or poor performance.
Solution: Stream processing and selective extraction:
// Avoid materializing entire structures
// Instead of:
allData = Json.Document(largeFile)
processedData = TransformEverything(allData)
// Do this:
streamProcessor = (item) => ExtractOnlyNeeded(item)
processedData = List.Transform(Json.Document(largeFile)[items], streamProcessor)
4. Incorrect Error Handling
Problem: Using simple try-otherwise blocks that mask important error information.
Solution: Implement structured error handling:
ProcessWithErrorInfo = (data) =>
try
let normalResult = ComplexTransformation(data)
in [Success = true, Data = normalResult, Error = null]
otherwise
let
errorInfo = [
Message = "Transformation failed",
Input = data,
Timestamp = DateTime.LocalNow()
]
in [Success = false, Data = null, Error = errorInfo]
5. Performance Anti-Patterns
Problem: Inefficient patterns that work for small datasets but don't scale.
Common Anti-Patterns:
Table.FromRecords inside loopsTable.AddColumn repeatedly instead of Table.TransformColumnsSolution:
// Instead of multiple AddColumn calls
step1 = Table.AddColumn(source, "Col1", (row) => Transform1(row[Field1]))
step2 = Table.AddColumn(step1, "Col2", (row) => Transform2(row[Field2]))
// Use TransformColumns or single complex transformation
Table.TransformColumns(source, {
{"Field1", Transform1, type text},
{"Field2", Transform2, type number}
})
6. Ignoring Data Type Conversions
Problem: JSON numbers come in as text, dates as strings, causing downstream issues.
Solution: Explicit type conversion with validation:
ConvertJsonTypes = (record as record) =>
[
ID = try Number.From(record[id]) otherwise null,
Amount = try Currency.From(record[amount]) otherwise 0,
Date = try DateTime.From(record[timestamp]) otherwise DateTime.LocalNow(),
IsActive = try Logical.From(record[active]) otherwise false
]
You've now explored the sophisticated world of JSON and XML processing in M, moving far beyond basic parsing to handle real-world complexity. The techniques covered—from recursive navigation and namespace handling to performance optimization and error resilience—form the foundation for building robust data integration pipelines.
The key insights to remember:
Your next steps should focus on applying these concepts to your specific data sources. Start by auditing your current JSON and XML processing queries for the anti-patterns discussed in this lesson. Look for opportunities to implement the helper functions and error handling patterns we've developed.
For advanced practitioners, consider exploring M's streaming capabilities for extremely large datasets, building custom connectors for proprietary APIs, and developing automated testing frameworks for your transformation logic. The intersection of M with Azure Data Factory and Power Platform creates opportunities for enterprise-scale data orchestration that builds on these foundational skills.
The semi-structured data landscape continues evolving, with formats like Apache Avro and Protocol Buffers gaining traction alongside JSON and XML. The principles you've learned—understanding data structures, building flexible transformations, and optimizing for scale—will serve you well as new formats emerge.
Learning Path: Advanced M Language