Working with JSON and XML Data Sources in M

You're staring at a REST API endpoint that returns nested JSON with inconsistent schemas, or wrestling with an XML feed where namespaces seem to shift randomly between records. Sound familiar? If you've moved beyond basic CSV imports in Power Query, you've likely encountered the beautiful complexity of semi-structured data formats. JSON and XML dominate modern data interchange, but their flexibility creates unique challenges when transforming them into the tabular structures Power BI and Excel expect.

The M language provides sophisticated tools for parsing, navigating, and transforming both JSON and XML data sources. However, mastering these capabilities requires understanding not just the syntax, but the underlying data structures, performance implications, and the subtle ways that real-world data deviates from theoretical ideals. You'll need to handle deeply nested objects, mixed data types, optional fields, and the inevitable schema variations that plague production systems.

This lesson takes you deep into M's JSON and XML processing capabilities, covering everything from basic parsing to advanced transformation patterns. You'll learn to build resilient queries that handle schema variations gracefully, optimize performance for large datasets, and implement sophisticated extraction patterns that would be impossible in the Power Query UI.

What you'll learn:

Advanced JSON parsing techniques including dynamic field access and recursive navigation
XML processing with namespace handling, attribute extraction, and complex hierarchical transformations
Performance optimization strategies for large semi-structured datasets
Error handling patterns for inconsistent schemas and missing data
Integration techniques for combining JSON/XML sources with relational data
Custom function development for reusable transformation logic

Prerequisites

This lesson assumes you have solid experience with M fundamentals, including record manipulation, list operations, and custom function creation. You should be comfortable reading intermediate M code and understand concepts like lazy evaluation and query folding. Basic familiarity with JSON and XML structure is helpful but not required.

Understanding M's Approach to Semi-Structured Data

Before diving into specific parsing techniques, it's crucial to understand how M conceptualizes JSON and XML data. Unlike languages that treat these formats as text to be parsed, M converts them directly into native data structures: records, lists, and primitive values for JSON; structured tables and records for XML.

This approach has profound implications for how you write transformation logic. Consider this JSON response from a customer API:

{
  "customer_id": 12345,
  "profile": {
    "name": "Sarah Chen",
    "email": "sarah.chen@example.com",
    "preferences": {
      "newsletter": true,
      "notifications": ["email", "sms"]
    }
  },
  "orders": [
    {
      "order_id": "ORD-001",
      "date": "2024-01-15T10:30:00Z",
      "items": [
        {"product": "Widget A", "quantity": 2, "price": 29.99},
        {"product": "Widget B", "quantity": 1, "price": 45.00}
      ]
    }
  ]
}

When M parses this JSON using Json.Document(), it doesn't create a flat structure. Instead, it preserves the hierarchical relationships:

let
    jsonText = Text.FromBinary(Web.Contents("https://api.example.com/customer/12345")),
    parsed = Json.Document(jsonText),
    // parsed is now a record with nested records and lists
    customerId = parsed[customer_id], // Direct field access: 12345
    customerName = parsed[profile][name], // Nested access: "Sarah Chen"
    firstOrder = parsed[orders]{0}, // List indexing
    orderItems = parsed[orders]{0}[items] // Returns a list of records
in
    parsed

This native structure preservation is both powerful and challenging. It's powerful because you can navigate complex hierarchies naturally. It's challenging because you must think in terms of M's type system rather than string manipulation.

The key insight is that M treats semi-structured data as first-class data structures, not serialized text. This means your transformation logic should leverage record manipulation, list processing, and type conversion functions rather than string parsing.

Advanced JSON Processing Techniques

Dynamic Field Access and Schema Flexibility

Real-world JSON rarely follows perfect schemas. Fields appear and disappear, data types change, and nesting levels vary. M provides several techniques for handling this dynamism gracefully.

The most fundamental is understanding the difference between static field access (record[field]) and dynamic access (Record.Field(record, fieldName)). Static access is faster and more readable, but it fails if the field doesn't exist. Dynamic access allows you to handle optional fields programmatically:

let
    // Sample data with inconsistent schema
    apiResponse1 = Json.Document("{""id"": 1, ""name"": ""Product A"", ""category"": ""electronics""}"),
    apiResponse2 = Json.Document("{""id"": 2, ""name"": ""Product B"", ""dept"": ""electronics""}"), // note: 'dept' instead of 'category'
    
    // Function to safely extract fields with fallback logic
    ExtractProductInfo = (response as record) as record =>
    let
        id = Record.FieldOrDefault(response, "id", null),
        name = Record.FieldOrDefault(response, "name", "Unknown"),
        
        // Handle the category/dept field variation
        category = if Record.HasFields(response, "category") then
                      response[category]
                   else if Record.HasFields(response, "dept") then
                      response[dept]
                   else
                      "Uncategorized",
                      
        result = [
            ID = id,
            Name = name,
            Category = category
        ]
    in
        result,
    
    // Apply to both responses
    product1 = ExtractProductInfo(apiResponse1),
    product2 = ExtractProductInfo(apiResponse2)
in
    {product1, product2}

This pattern becomes essential when working with APIs that evolve over time or aggregate data from multiple sources with slight schema differences.

Complex JSON structures often require recursive processing. Consider a nested organizational chart or a product catalog with unlimited category depth. M's recursive capabilities, combined with proper error handling, can elegantly flatten these structures:

let
    // Sample nested organizational data
    orgData = Json.Document("{
        ""name"": ""CEO"",
        ""employee_id"": ""001"",
        ""reports"": [
            {
                ""name"": ""VP Sales"",
                ""employee_id"": ""002"",
                ""reports"": [
                    {""name"": ""Sales Rep 1"", ""employee_id"": ""003"", ""reports"": []},
                    {""name"": ""Sales Rep 2"", ""employee_id"": ""004"", ""reports"": []}
                ]
            },
            {
                ""name"": ""VP Engineering"",
                ""employee_id"": ""005"",
                ""reports"": [
                    {""name"": ""Developer 1"", ""employee_id"": ""006"", ""reports"": []},
                    {""name"": ""Developer 2"", ""employee_id"": ""007"", ""reports"": []}
                ]
            }
        ]
    }"),
    
    // Recursive function to flatten the hierarchy
    FlattenHierarchy = (node as record, optional parentId as text, optional level as number) as list =>
    let
        currentLevel = if level = null then 0 else level,
        currentRecord = [
            EmployeeId = node[employee_id],
            Name = node[name],
            ParentId = parentId,
            Level = currentLevel
        ],
        
        // Process subordinates recursively
        reports = Record.FieldOrDefault(node, "reports", {}),
        childRecords = if Value.Type(reports) = type list and List.Count(reports) > 0 then
            List.Transform(reports, (report) => 
                FlattenHierarchy(report, node[employee_id], currentLevel + 1)
            )
        else
            {},
        
        // Combine current record with all child records
        allChildRecords = if Value.Type(childRecords) = type list then
            List.Combine(childRecords)
        else
            {},
        
        result = List.Combine({{currentRecord}, allChildRecords})
    in
        result,
    
    flattenedData = FlattenHierarchy(orgData),
    finalTable = Table.FromRecords(flattenedData)
in
    finalTable

This recursive approach scales to arbitrary depths and maintains referential integrity through the parent-child relationships.

Performance Optimization for Large JSON Datasets

When processing large JSON datasets, especially from APIs with pagination or bulk exports, performance becomes critical. M's lazy evaluation helps, but you need to structure your queries to minimize memory usage and processing overhead.

The key principle is to avoid materializing large intermediate structures. Instead of loading entire JSON arrays into memory, process them in chunks or streams:

let
    // Function to process JSON in streaming fashion
    ProcessLargeJsonFile = (filePath as text) as table =>
    let
        // Read file as binary to control parsing
        binaryContent = File.Contents(filePath),
        
        // For very large files, consider reading line by line
        // This example assumes a JSON Lines format (one JSON object per line)
        textContent = Text.FromBinary(binaryContent),
        lines = Text.Split(textContent, "#(cr)#(lf)"),
        
        // Process each line as separate JSON
        validLines = List.Select(lines, each Text.Length(_) > 0),
        
        // Transform each JSON line - this is where lazy evaluation helps
        transformedRecords = List.Transform(validLines, (line) =>
            try 
                let
                    jsonRecord = Json.Document(line),
                    // Extract only the fields you need - don't materialize entire objects
                    extractedData = [
                        id = Record.FieldOrDefault(jsonRecord, "id", null),
                        timestamp = Record.FieldOrDefault(jsonRecord, "timestamp", null),
                        value = Record.FieldOrDefault(jsonRecord, "value", null)
                        // Avoid extracting nested objects unless necessary
                    ]
                in
                    extractedData
            otherwise 
                null
        ),
        
        // Filter out parsing errors
        cleanRecords = List.Select(transformedRecords, each _ <> null),
        
        // Convert to table only at the end
        result = Table.FromRecords(cleanRecords)
    in
        result
in
    ProcessLargeJsonFile

For APIs with pagination, implement a pattern that processes pages sequentially rather than accumulating all data in memory:

let
    // Function to fetch paginated API data
    FetchAllPages = (baseUrl as text, initialParams as record) as table =>
    let
        // Helper function to fetch a single page
        FetchPage = (url as text, pageNumber as number) =>
        let
            pageUrl = url & "&page=" & Number.ToText(pageNumber),
            response = try Json.Document(Web.Contents(pageUrl)) otherwise null,
            hasData = response <> null and Record.HasFields(response, "data") and List.Count(response[data]) > 0
        in
            [Response = response, HasData = hasData],
        
        // Recursive function to fetch all pages
        FetchAllPagesImpl = (pageNum as number, accumulator as list) =>
        let
            pageResult = FetchPage(baseUrl, pageNum),
            currentData = if pageResult[HasData] then pageResult[Response][data] else {},
            
            newAccumulator = if Value.Type(currentData) = type list then
                List.Combine({accumulator, currentData})
            else
                accumulator,
            
            // Continue if we got data, stop otherwise
            result = if pageResult[HasData] then
                @FetchAllPagesImpl(pageNum + 1, newAccumulator)
            else
                newAccumulator
        in
            result,
        
        allRecords = FetchAllPagesImpl(1, {}),
        finalTable = Table.FromRecords(allRecords)
    in
        finalTable
in
    FetchAllPages("https://api.example.com/data?limit=100", [])

Handling JSON Schema Variations

Production JSON data rarely follows a single, consistent schema. You'll encounter optional fields, varying data types, and structural differences that can break rigid parsing logic. Building resilient transformations requires anticipating these variations and handling them gracefully.

Consider this pattern for handling schema evolution in a user analytics API:

let
    // Sample data showing schema evolution over time
    oldFormat = Json.Document("{
        ""user_id"": 12345,
        ""event"": ""page_view"",
        ""timestamp"": ""2023-01-15T10:30:00Z"",
        ""page"": ""/products""
    }"),
    
    newFormat = Json.Document("{
        ""user_id"": 67890,
        ""event_type"": ""page_view"",
        ""occurred_at"": ""2024-01-15T10:30:00Z"",
        ""properties"": {
            ""page_path"": ""/products"",
            ""referrer"": ""https://google.com"",
            ""session_id"": ""sess_123""
        }
    }"),
    
    // Adaptive transformation function
    NormalizeEvent = (eventRecord as record) as record =>
    let
        // Determine schema version based on field presence
        isNewFormat = Record.HasFields(eventRecord, "event_type"),
        
        // Extract user ID (consistent across versions)
        userId = eventRecord[user_id],
        
        // Handle event name field variations
        eventName = if isNewFormat then
            eventRecord[event_type]
        else
            Record.FieldOrDefault(eventRecord, "event", "unknown"),
        
        // Handle timestamp field variations
        timestamp = if isNewFormat then
            Record.FieldOrDefault(eventRecord, "occurred_at", null)
        else
            Record.FieldOrDefault(eventRecord, "timestamp", null),
        
        // Handle page information
        pagePath = if isNewFormat then
            try eventRecord[properties][page_path] otherwise null
        else
            Record.FieldOrDefault(eventRecord, "page", null),
        
        // Extract new format fields with defaults
        referrer = if isNewFormat then
            try eventRecord[properties][referrer] otherwise null
        else
            null,
            
        sessionId = if isNewFormat then
            try eventRecord[properties][session_id] otherwise null
        else
            null,
        
        // Construct normalized record
        normalizedEvent = [
            UserId = userId,
            EventName = eventName,
            Timestamp = timestamp,
            PagePath = pagePath,
            Referrer = referrer,
            SessionId = sessionId,
            SchemaVersion = if isNewFormat then "v2" else "v1"
        ]
    in
        normalizedEvent,
    
    // Apply normalization to both formats
    normalizedOld = NormalizeEvent(oldFormat),
    normalizedNew = NormalizeEvent(newFormat),
    
    // Combine into a table
    combinedData = Table.FromRecords({normalizedOld, normalizedNew})
in
    combinedData

This adaptive approach allows you to process data with multiple schema versions in a single query, maintaining backward compatibility while handling new fields appropriately.

Mastering XML Processing in Power Query

XML processing in M differs significantly from JSON handling due to XML's more complex structure, including namespaces, attributes, and mixed content models. Understanding these differences is crucial for effective XML transformation.

Namespace Handling and XML Parsing Fundamentals

XML namespaces are perhaps the most common source of confusion when processing XML in M. Unlike JSON, where field names are simple strings, XML elements can be qualified with namespace prefixes that affect how M interprets the structure.

Consider this XML response from a financial data service:

<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:fin="http://example.com/financial/v2">
  <soap:Header>
    <fin:RequestId>12345</fin:RequestId>
  </soap:Header>
  <soap:Body>
    <fin:StockQuotes>
      <fin:Quote symbol="AAPL">
        <fin:Price currency="USD">150.25</fin:Price>
        <fin:Volume>1000000</fin:Volume>
        <fin:LastUpdated>2024-01-15T15:30:00Z</fin:LastUpdated>
      </fin:Quote>
      <fin:Quote symbol="MSFT">
        <fin:Price currency="USD">250.75</fin:Price>
        <fin:Volume>800000</fin:Volume>
        <fin:LastUpdated>2024-01-15T15:30:00Z</fin:LastUpdated>
      </fin:Quote>
    </fin:StockQuotes>
  </soap:Body>
</soap:Envelope>

When M parses this XML using Xml.Document(), it preserves namespace information, but accessing elements requires understanding how namespaces translate to M's record structure:

let
    // Sample XML content (in practice, this would come from Web.Contents or File.Contents)
    xmlContent = "<soap:Envelope xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:fin=""http://example.com/financial/v2"">
  <soap:Body>
    <fin:StockQuotes>
      <fin:Quote symbol=""AAPL"">
        <fin:Price currency=""USD"">150.25</fin:Price>
        <fin:Volume>1000000</fin:Volume>
      </fin:Quote>
    </fin:StockQuotes>
  </soap:Body>
</soap:Envelope>",
    
    // Parse XML
    parsedXml = Xml.Document(Text.ToBinary(xmlContent)),
    
    // Navigate through namespaced elements
    // M uses the full namespace URI, not the prefix
    soapBody = parsedXml[#"http://schemas.xmlsoap.org/soap/envelope/:Body"],
    stockQuotes = soapBody[#"http://example.com/financial/v2:StockQuotes"],
    quotes = stockQuotes[#"http://example.com/financial/v2:Quote"],
    
    // Extract data from each quote
    ExtractQuoteData = (quote as record) =>
    let
        // Attributes are accessed through the #"@attribute" syntax
        symbol = quote[#"@symbol"],
        
        // Element values are accessed by their full namespace URI
        price = quote[#"http://example.com/financial/v2:Price"][#"#text"],
        priceCurrency = quote[#"http://example.com/financial/v2:Price"][#"@currency"],
        volume = Number.From(quote[#"http://example.com/financial/v2:Volume"][#"#text"]),
        
        result = [
            Symbol = symbol,
            Price = Number.From(price),
            Currency = priceCurrency,
            Volume = volume
        ]
    in
        result,
    
    // Process all quotes
    quotesData = if Value.Type(quotes) = type list then
        List.Transform(quotes, ExtractQuoteData)
    else
        {ExtractQuoteData(quotes)}, // Handle single quote case
        
    finalTable = Table.FromRecords(quotesData)
in
    finalTable

The verbose namespace syntax makes XML processing more complex than JSON, but it's essential for correctly handling real-world XML documents that use multiple namespaces.

Advanced XML Transformation Patterns

Complex XML documents often contain mixed content, nested structures, and varying element patterns that require sophisticated transformation logic. Here's how to handle common XML processing scenarios:

Processing Mixed Content and CDATA Sections:

let
    // XML with mixed content and CDATA
    xmlWithMixedContent = "<articles>
        <article id=""1"">
            <title>Data Processing</title>
            <content><![CDATA[This article discusses <em>advanced</em> techniques for processing data.]]></content>
            <metadata>
                <author>John Doe</author>
                <tags>
                    <tag>data</tag>
                    <tag>processing</tag>
                    <tag>analytics</tag>
                </tags>
            </metadata>
        </article>
    </articles>",
    
    parsedXml = Xml.Document(Text.ToBinary(xmlWithMixedContent)),
    articles = parsedXml[articles][article],
    
    ProcessArticle = (article as record) =>
    let
        articleId = article[#"@id"],
        title = article[title][#"#text"],
        
        // CDATA content is accessible through #text
        content = article[content][#"#text"],
        
        // Extract author
        author = article[metadata][author][#"#text"],
        
        // Handle multiple tags - XML lists can be tricky
        tagsElement = article[metadata][tags][tag],
        tags = if Value.Type(tagsElement) = type list then
            List.Transform(tagsElement, (tagElement) => tagElement[#"#text"])
        else
            {tagsElement[#"#text"]}, // Single tag case
        
        result = [
            ID = articleId,
            Title = title,
            Content = content,
            Author = author,
            Tags = Text.Combine(tags, "; ")
        ]
    in
        result,
    
    processedArticles = if Value.Type(articles) = type list then
        List.Transform(articles, ProcessArticle)
    else
        {ProcessArticle(articles)},
        
    finalTable = Table.FromRecords(processedArticles)
in
    finalTable

Handling Hierarchical XML with Recursive Processing:

let
    // Sample hierarchical XML (product catalog)
    catalogXml = "<catalog>
        <category name=""Electronics"" id=""1"">
            <product id=""101"" name=""Laptop"" price=""999.99"" />
            <product id=""102"" name=""Mouse"" price=""29.99"" />
            <category name=""Components"" id=""11"">
                <product id=""201"" name=""RAM"" price=""89.99"" />
                <product id=""202"" name=""SSD"" price=""149.99"" />
            </category>
        </category>
        <category name=""Books"" id=""2"">
            <product id=""301"" name=""Data Science Handbook"" price=""49.99"" />
        </category>
    </catalog>",
    
    parsedXml = Xml.Document(Text.ToBinary(catalogXml)),
    rootCategories = parsedXml[catalog][category],
    
    // Recursive function to process categories and their products
    ProcessCategory = (categoryElement as record, optional parentPath as text) as list =>
    let
        categoryName = categoryElement[#"@name"],
        categoryId = categoryElement[#"@id"],
        currentPath = if parentPath = null then categoryName else parentPath & " > " & categoryName,
        
        // Extract products in this category
        products = Record.FieldOrDefault(categoryElement, "product", {}),
        productRecords = if Value.Type(products) = type list then
            List.Transform(products, (product) => [
                CategoryPath = currentPath,
                CategoryId = categoryId,
                ProductId = product[#"@id"],
                ProductName = product[#"@name"],
                Price = Number.From(product[#"@price"])
            ])
        else if Value.Type(products) = type record then
            {[
                CategoryPath = currentPath,
                CategoryId = categoryId,
                ProductId = products[#"@id"],
                ProductName = products[#"@name"],
                Price = Number.From(products[#"@price"])
            ]}
        else
            {},
        
        // Process subcategories recursively
        subcategories = Record.FieldOrDefault(categoryElement, "category", {}),
        subcategoryRecords = if Value.Type(subcategories) = type list then
            List.Combine(List.Transform(subcategories, (subcat) => 
                @ProcessCategory(subcat, currentPath)
            ))
        else if Value.Type(subcategories) = type record then
            @ProcessCategory(subcategories, currentPath)
        else
            {},
        
        // Combine products and subcategory results
        allRecords = List.Combine({productRecords, subcategoryRecords})
    in
        allRecords,
    
    // Process all root categories
    allProducts = if Value.Type(rootCategories) = type list then
        List.Combine(List.Transform(rootCategories, (cat) => ProcessCategory(cat)))
    else
        ProcessCategory(rootCategories),
        
    finalTable = Table.FromRecords(allProducts)
in
    finalTable

XML Performance and Memory Management

XML processing can be memory-intensive, especially with large documents. Unlike JSON, which M can often process in streaming fashion, XML's hierarchical nature typically requires loading the entire document structure. However, you can optimize performance through strategic parsing and selective data extraction:

let
    // Function for processing large XML files efficiently
    ProcessLargeXmlFile = (filePath as text) as table =>
    let
        // Read file in binary format for better control
        binaryContent = File.Contents(filePath),
        
        // For very large XML files, consider extracting specific sections
        // This example shows how to process in chunks if the XML has a repeating structure
        
        // Parse the full XML (unavoidable for proper XML processing)
        xmlDocument = Xml.Document(binaryContent),
        
        // Extract only the data elements you need - avoid materializing the entire tree
        dataElements = xmlDocument[root][data],
        
        // Process elements with minimal intermediate structures
        ProcessElement = (element as record) =>
        let
            // Extract only required fields to minimize memory usage
            id = try element[#"@id"] otherwise null,
            name = try element[name][#"#text"] otherwise null,
            value = try Number.From(element[value][#"#text"]) otherwise null,
            
            // Don't extract complex nested structures unless necessary
            result = [ID = id, Name = name, Value = value]
        in
            result,
        
        // Transform elements efficiently
        processedData = if Value.Type(dataElements) = type list then
            List.Transform(dataElements, ProcessElement)
        else
            {ProcessElement(dataElements)},
        
        // Convert to table only at the end
        finalTable = Table.FromRecords(processedData)
    in
        finalTable
in
    ProcessLargeXmlFile

For extremely large XML files, consider preprocessing them outside of Power Query (using tools like Saxon or xmlstarlet) to extract relevant sections before importing into M.

Integration Patterns and Real-World Applications

The true power of JSON and XML processing in M emerges when you integrate these semi-structured data sources with relational data and build comprehensive data transformation pipelines.

Combining JSON APIs with Database Tables

A common pattern involves enriching database records with data from JSON APIs. This scenario requires handling potential network failures, rate limiting, and data consistency:

let
    // Base customer data from database
    customerTable = Table.FromRecords({
        [CustomerID = 1001, Name = "Acme Corp", Industry = "Manufacturing"],
        [CustomerID = 1002, Name = "TechStart Inc", Industry = "Technology"],
        [CustomerID = 1003, Name = "Global Retail", Industry = "Retail"]
    }),
    
    // Function to enrich customer data with API information
    EnrichWithApiData = (customerId as number) as record =>
    let
        // Construct API URL
        apiUrl = "https://api.example.com/customers/" & Number.ToText(customerId),
        
        // Make API call with error handling
        apiResponse = try 
            Json.Document(Web.Contents(apiUrl))
        otherwise 
            [error = "API_UNAVAILABLE"],
        
        // Extract additional data if API call succeeded
        enrichmentData = if not Record.HasFields(apiResponse, "error") then
            [
                CreditScore = Record.FieldOrDefault(apiResponse, "credit_score", null),
                LastOrderDate = Record.FieldOrDefault(apiResponse, "last_order_date", null),
                TotalOrders = Record.FieldOrDefault(apiResponse, "total_orders", null),
                PreferredContact = try apiResponse[preferences][contact_method] otherwise null
            ]
        else
            [
                CreditScore = null,
                LastOrderDate = null,
                TotalOrders = null,
                PreferredContact = null
            ]
    in
        enrichmentData,
    
    // Add enrichment data to customer table
    enrichedTable = Table.AddColumn(
        customerTable, 
        "ApiData", 
        (row) => EnrichWithApiData(row[CustomerID]),
        type record
    ),
    
    // Expand the enrichment data into separate columns
    expandedTable = Table.ExpandRecordColumn(
        enrichedTable,
        "ApiData",
        {"CreditScore", "LastOrderDate", "TotalOrders", "PreferredContact"},
        {"CreditScore", "LastOrderDate", "TotalOrders", "PreferredContact"}
    )
in
    expandedTable

Building Reusable Transformation Functions

As your JSON and XML processing requirements grow, building libraries of reusable functions becomes essential. These functions should handle common patterns, error scenarios, and performance optimizations:

let
    // Library of reusable JSON/XML processing functions
    
    // Generic function for safe JSON field extraction
    JsonFieldExtractor = (jsonRecord as record, fieldPath as list, optional defaultValue as any) as any =>
    let
        ExtractRecursive = (currentRecord as record, remainingPath as list) =>
            if List.Count(remainingPath) = 0 then
                currentRecord
            else
                let
                    fieldName = remainingPath{0},
                    restOfPath = List.RemoveFirstN(remainingPath, 1),
                    fieldValue = Record.FieldOrDefault(currentRecord, fieldName, null)
                in
                    if fieldValue = null then
                        null
                    else if List.Count(restOfPath) = 0 then
                        fieldValue
                    else if Value.Type(fieldValue) = type record then
                        @ExtractRecursive(fieldValue, restOfPath)
                    else
                        null,
        
        result = try ExtractRecursive(jsonRecord, fieldPath) otherwise null
    in
        if result = null and defaultValue <> null then defaultValue else result,
    
    // Function for handling JSON arrays with schema validation
    ProcessJsonArray = (jsonArray as list, schemaValidator as function, optional errorHandler as function) as list =>
    let
        defaultErrorHandler = (item, error) => [OriginalItem = item, Error = error, IsValid = false],
        actualErrorHandler = if errorHandler = null then defaultErrorHandler else errorHandler,
        
        processItem = (item) =>
            try
                let
                    validationResult = schemaValidator(item),
                    result = if validationResult then item else actualErrorHandler(item, "Schema validation failed")
                in
                    result
            otherwise
                actualErrorHandler(item, "Processing error"),
        
        processedItems = List.Transform(jsonArray, processItem)
    in
        processedItems,
    
    // XML namespace resolver for cleaner element access
    XmlNamespaceResolver = (namespaces as record) =>
    let
        CreateAccessor = (prefix as text, uri as text) =>
            (elementName as text) => uri & ":" & elementName,
        
        accessors = Record.TransformFields(namespaces, {"", CreateAccessor})
    in
        accessors,
    
    // Example usage of the library functions
    sampleJsonData = Json.Document("{
        ""user"": {
            ""profile"": {
                ""name"": ""John Doe"",
                ""age"": 30
            },
            ""orders"": [
                {""id"": 1, ""total"": 99.99},
                {""id"": 2, ""total"": 149.50}
            ]
        }
    }"),
    
    // Extract nested field safely
    userName = JsonFieldExtractor(sampleJsonData, {"user", "profile", "name"}, "Unknown User"),
    
    // Process orders array with validation
    orderValidator = (order) => Record.HasFields(order, {"id", "total"}),
    validOrders = ProcessJsonArray(sampleJsonData[user][orders], orderValidator),
    
    // Demonstrate namespace resolver for XML
    xmlNamespaces = [
        soap = "http://schemas.xmlsoap.org/soap/envelope/",
        fin = "http://example.com/financial/v2"
    ],
    nsResolver = XmlNamespaceResolver(xmlNamespaces),
    soapBodyElement = nsResolver[soap]("Body") // Returns "http://schemas.xmlsoap.org/soap/envelope/:Body"
in
    [
        UserName = userName,
        ValidOrders = validOrders,
        SoapBodyElement = soapBodyElement
    ]

Error Handling and Data Quality Patterns

Robust JSON and XML processing requires comprehensive error handling that goes beyond simple try-catch blocks. You need to handle network issues, malformed data, schema changes, and partial failures gracefully:

let
    // Comprehensive error handling framework
    
    // Error categorization and reporting
    CreateErrorReport = (source as text, operation as text, error as any, optional context as record) as record =>
    let
        errorInfo = [
            Source = source,
            Operation = operation,
            ErrorType = if Value.Type(error) = type text then "String Error" else "System Error",
            ErrorMessage = if Value.Type(error) = type text then error else "Unknown system error",
            Timestamp = DateTime.LocalNow(),
            Context = if context = null then [] else context
        ]
    in
        errorInfo,
    
    // Retry mechanism for API calls
    ApiCallWithRetry = (url as text, optional maxRetries as number, optional delaySeconds as number) as any =>
    let
        actualMaxRetries = if maxRetries = null then 3 else maxRetries,
        actualDelay = if delaySeconds = null then 1 else delaySeconds,
        
        MakeCall = (attempt as number) =>
            try
                Json.Document(Web.Contents(url))
            otherwise
                if attempt < actualMaxRetries then
                    Function.InvokeAfter(() => @MakeCall(attempt + 1), #duration(0, 0, 0, actualDelay))
                else
                    CreateErrorReport("API", "GET", "Max retries exceeded", [URL = url, Attempts = attempt])
    in
        MakeCall(1),
    
    // Data quality validation for JSON structures
    ValidateJsonStructure = (jsonData as any, expectedSchema as record) as record =>
    let
        ValidateField = (fieldName as text, expectedType as type, isRequired as logical) =>
        let
            hasField = if Value.Type(jsonData) = type record then 
                Record.HasFields(jsonData, fieldName) 
            else false,
            
            fieldValue = if hasField then Record.Field(jsonData, fieldName) else null,
            
            isCorrectType = if fieldValue = null then not isRequired 
                          else Value.Is(fieldValue, expectedType),
            
            result = [
                FieldName = fieldName,
                IsPresent = hasField,
                IsRequired = isRequired,
                ExpectedType = Type.ToText(expectedType),
                ActualType = if fieldValue = null then "null" else Type.ToText(Value.Type(fieldValue)),
                IsValid = (not isRequired and fieldValue = null) or (hasField and isCorrectType)
            ]
        in
            result,
        
        // Validate all expected fields
        fieldValidations = List.Transform(
            Record.FieldNames(expectedSchema),
            (fieldName) =>
                let
                    schemaInfo = Record.Field(expectedSchema, fieldName),
                    expectedType = schemaInfo[Type],
                    isRequired = Record.FieldOrDefault(schemaInfo, "Required", true)
                in
                    ValidateField(fieldName, expectedType, isRequired)
        ),
        
        // Calculate overall validity
        allValid = List.AllTrue(List.Transform(fieldValidations, (validation) => validation[IsValid])),
        
        validationReport = [
            IsValid = allValid,
            FieldValidations = fieldValidations,
            ValidationTimestamp = DateTime.LocalNow()
        ]
    in
        validationReport,
    
    // Example usage with comprehensive error handling
    ProcessApiDataSafely = (apiUrl as text) as record =>
    let
        // Define expected schema
        expectedSchema = [
            id = [Type = type number, Required = true],
            name = [Type = type text, Required = true],
            email = [Type = type text, Required = false]
        ],
        
        // Attempt API call with retries
        apiResult = ApiCallWithRetry(apiUrl, 3, 2),
        
        // Check if API call succeeded
        hasError = Record.HasFields(apiResult, "Source"),
        
        processingResult = if hasError then
            [
                Success = false,
                Error = apiResult,
                Data = null,
                ValidationReport = null
            ]
        else
            let
                // Validate data structure
                validation = ValidateJsonStructure(apiResult, expectedSchema),
                
                processedData = if validation[IsValid] then
                    [
                        ID = apiResult[id],
                        Name = apiResult[name],
                        Email = Record.FieldOrDefault(apiResult, "email", "not provided")
                    ]
                else
                    null
            in
                [
                    Success = validation[IsValid],
                    Error = if validation[IsValid] then null else validation,
                    Data = processedData,
                    ValidationReport = validation
                ]
    in
        processingResult
in
    ProcessApiDataSafely

Hands-On Exercise

Let's build a comprehensive data integration solution that combines JSON and XML sources to create a unified customer analytics dataset. This exercise demonstrates real-world complexity and best practices.

Scenario: You're building a customer analytics dashboard that needs to combine:

Customer data from a REST API (JSON format)
Transaction history from an XML web service
Product catalog from an XML file
Customer preferences from a JSON configuration file

let
    // Step 1: Define reusable helper functions
    SafeJsonExtract = (record as record, fieldPath as list, optional defaultValue as any) =>
    let
        ExtractNested = (current as record, path as list) =>
            if List.Count(path) = 0 then current
            else if not Record.HasFields(current, path{0}) then null
            else 
                let nextValue = current[path{0}]
                in if List.Count(path) = 1 then nextValue
                   else if Value.Type(nextValue) = type record then @ExtractNested(nextValue, List.RemoveFirstN(path, 1))
                   else null
    in
        try ExtractNested(record, fieldPath) otherwise (if defaultValue <> null then defaultValue else null),
    
    // Step 2: Load and process customer data from JSON API
    CustomerApiData = () =>
    let
        // Simulated API response (in real scenario, use Web.Contents)
        apiResponse = Json.Document("{
            ""customers"": [
                {
                    ""customer_id"": 1001,
                    ""profile"": {
                        ""first_name"": ""Sarah"",
                        ""last_name"": ""Johnson"",
                        ""email"": ""sarah.johnson@email.com"",
                        ""registration_date"": ""2023-01-15T10:30:00Z""
                    },
                    ""account_status"": ""active"",
                    ""lifetime_value"": 2500.75
                },
                {
                    ""customer_id"": 1002,
                    ""profile"": {
                        ""first_name"": ""Michael"",
                        ""last_name"": ""Chen"",
                        ""email"": ""m.chen@company.com"",
                        ""registration_date"": ""2023-03-20T14:15:00Z""
                    },
                    ""account_status"": ""premium"",
                    ""lifetime_value"": 4200.50
                }
            ]
        }"),
        
        customers = apiResponse[customers],
        
        ProcessCustomer = (customer) =>
        [
            CustomerID = customer[customer_id],
            FirstName = SafeJsonExtract(customer, {"profile", "first_name"}, "Unknown"),
            LastName = SafeJsonExtract(customer, {"profile", "last_name"}, "Unknown"),
            Email = SafeJsonExtract(customer, {"profile", "email"}, ""),
            RegistrationDate = SafeJsonExtract(customer, {"profile", "registration_date"}, null),
            AccountStatus = customer[account_status],
            LifetimeValue = customer[lifetime_value]
        ],
        
        processedCustomers = List.Transform(customers, ProcessCustomer),
        customerTable = Table.FromRecords(processedCustomers)
    in
        customerTable,
    
    // Step 3: Load and process transaction data from XML
    TransactionXmlData = () =>
    let
        // Simulated XML response
        xmlContent = "<transactions xmlns:tx=""http://example.com/transactions"">
            <tx:transaction id=""T001"" customer_id=""1001"">
                <tx:date>2024-01-10T09:30:00Z</tx:date>
                <tx:amount currency=""USD"">125.50</tx:amount>
                <tx:products>
                    <tx:product id=""P001"" quantity=""2"" />
                    <tx:product id=""P002"" quantity=""1"" />
                </tx:products>
            </tx:transaction>
            <tx:transaction id=""T002"" customer_id=""1002"">
                <tx:date>2024-01-12T14:45:00Z</tx:date>
                <tx:amount currency=""USD"">89.99</tx:amount>
                <tx:products>
                    <tx:product id=""P003"" quantity=""1"" />
                </tx:products>
            </tx:transaction>
        </transactions>",
        
        xmlDoc = Xml.Document(Text.ToBinary(xmlContent)),
        transactions = xmlDoc[#"http://example.com/transactions:transactions"][#"http://example.com/transactions:transaction"],
        
        ProcessTransaction = (transaction) =>
        let
            transactionId = transaction[#"@id"],
            customerId = Number.From(transaction[#"@customer_id"]),
            date = transaction[#"http://example.com/transactions:date"][#"#text"],
            amount = Number.From(transaction[#"http://example.com/transactions:amount"][#"#text"]),
            currency = transaction[#"http://example.com/transactions:amount"][#"@currency"],
            
            // Handle products - could be single or multiple
            productsElement = transaction[#"http://example.com/transactions:products"][#"http://example.com/transactions:product"],
            products = if Value.Type(productsElement) = type list then
                List.Transform(productsElement, (product) => [
                    ProductID = product[#"@id"],
                    Quantity = Number.From(product[#"@quantity"])
                ])
            else
                {[
                    ProductID = productsElement[#"@id"],
                    Quantity = Number.From(productsElement[#"@quantity"])
                ]}
        in
            [
                TransactionID = transactionId,
                CustomerID = customerId,
                TransactionDate = date,
                Amount = amount,
                Currency = currency,
                Products = products
            ],
        
        processedTransactions = if Value.Type(transactions) = type list then
            List.Transform(transactions, ProcessTransaction)
        else
            {ProcessTransaction(transactions)},
            
        transactionTable = Table.FromRecords(processedTransactions)
    in
        transactionTable,
    
    // Step 4: Load customer preferences from JSON
    CustomerPreferences = () =>
    let
        preferencesJson = Json.Document("{
            ""preferences"": {
                ""1001"": {
                    ""communication"": {
                        ""email_marketing"": true,
                        ""sms_notifications"": false,
                        ""preferred_language"": ""en""
                    },
                    ""shopping"": {
                        ""categories"": [""electronics"", ""books""],
                        ""price_alerts"": true
                    }
                },
                ""1002"": {
                    ""communication"": {
                        ""email_marketing"": false,
                        ""sms_notifications"": true,
                        ""preferred_language"": ""es""
                    },
                    ""shopping"": {
                        ""categories"": [""clothing"", ""home""],
                        ""price_alerts"": false
                    }
                }
            }
        }"),
        
        preferencesRecord = preferencesJson[preferences],
        customerIds = Record.FieldNames(preferencesRecord),
        
        ProcessPreferences = (customerId) =>
        let
            customerPrefs = Record.Field(preferencesRecord, customerId),
            result = [
                CustomerID = Number.From(customerId),
                EmailMarketing = SafeJsonExtract(customerPrefs, {"communication", "email_marketing"}, false),
                SmsNotifications = SafeJsonExtract(customerPrefs, {"communication", "sms_notifications"}, false),
                PreferredLanguage = SafeJsonExtract(customerPrefs, {"communication", "preferred_language"}, "en"),
                PreferredCategories = try Text.Combine(customerPrefs[shopping][categories], "; ") otherwise "",
                PriceAlerts = SafeJsonExtract(customerPrefs, {"shopping", "price_alerts"}, false)
            ]
        in
            result,
        
        preferencesTable = Table.FromRecords(List.Transform(customerIds, ProcessPreferences))
    in
        preferencesTable,
    
    // Step 5: Combine all data sources
    customers = CustomerApiData(),
    transactions = TransactionXmlData(),
    preferences = CustomerPreferences(),
    
    // Join customers with preferences
    customersWithPrefs = Table.NestedJoin(
        customers,
        {"CustomerID"},
        preferences,
        {"CustomerID"},
        "Preferences",
        JoinKind.LeftOuter
    ),
    
    expandedCustomers = Table.ExpandTableColumn(
        customersWithPrefs,
        "Preferences",
        {"EmailMarketing", "SmsNotifications", "PreferredLanguage", "PreferredCategories", "PriceAlerts"},
        {"EmailMarketing", "SmsNotifications", "PreferredLanguage", "PreferredCategories", "PriceAlerts"}
    ),
    
    // Add transaction summary to each customer
    transactionSummary = Table.Group(
        transactions,
        {"CustomerID"},
        {
            {"TotalTransactions", each Table.RowCount(_), Int64.Type},
            {"TotalSpent", each List.Sum([Amount]), type number},
            {"LastTransactionDate", each List.Max([TransactionDate]), type text}
        }
    ),
    
    finalCustomerData = Table.NestedJoin(
        expandedCustomers,
        {"CustomerID"},
        transactionSummary,
        {"CustomerID"},
        "TransactionSummary",
        JoinKind.LeftOuter
    ),
    
    finalTable = Table.ExpandTableColumn(
        finalCustomerData,
        "TransactionSummary",
        {"TotalTransactions", "TotalSpent", "LastTransactionDate"},
        {"TotalTransactions", "TotalSpent", "LastTransactionDate"}
    )
in
    finalTable

This exercise demonstrates several key concepts:

Handling different data source formats within a single query
Building reusable utility functions for safe data extraction
Properly processing XML with namespaces
Combining multiple data sources through joins
Creating aggregate summaries from transaction details

Common Mistakes & Troubleshooting

Working with JSON and XML in M presents unique challenges. Here are the most common pitfalls and their solutions:

1. Namespace Confusion in XML Processing

Problem: Many developers struggle with XML namespace syntax, leading to queries that work with sample data but fail in production.

Solution: Always use the full namespace URI, not the prefix. Create a namespace mapping function for cleaner code:

// Wrong approach - using prefix
element[soap:Body] // This will fail

// Correct approach - using full URI
element[#"http://schemas.xmlsoap.org/soap/envelope/:Body"]

// Better approach - namespace helper
let
    ns = [
        soap = (name) => "http://schemas.xmlsoap.org/soap/envelope/:" & name,
        custom = (name) => "http://example.com/custom:" & name
    ]
in
    element[ns[soap]("Body")][ns[custom]("Data")]

2. Assuming Consistent JSON Structure

Problem: Queries break when APIs return slightly different schemas or when optional fields are missing.

Solution: Always use defensive programming patterns:

// Fragile approach
customerName = apiResponse[customer][profile][name]

// Robust approach
customerName = try apiResponse[customer][profile][name] otherwise 
               try apiResponse[customer][name] otherwise 
               "Unknown Customer"

// Even better - use a helper function
SafeExtract = (record, path, default) => /* implementation shown earlier */
customerName = SafeExtract(apiResponse, {"customer", "profile", "name"}, "Unknown")

3. Memory Issues with Large Documents

Problem: Processing large JSON or XML files causes memory errors or poor performance.

Solution: Stream processing and selective extraction:

// Avoid materializing entire structures
// Instead of:
allData = Json.Document(largeFile)
processedData = TransformEverything(allData)

// Do this:
streamProcessor = (item) => ExtractOnlyNeeded(item)
processedData = List.Transform(Json.Document(largeFile)[items], streamProcessor)

4. Incorrect Error Handling

Problem: Using simple try-otherwise blocks that mask important error information.

Solution: Implement structured error handling:

ProcessWithErrorInfo = (data) =>
try
    let normalResult = ComplexTransformation(data)
    in [Success = true, Data = normalResult, Error = null]
otherwise
    let 
        errorInfo = [
            Message = "Transformation failed",
            Input = data,
            Timestamp = DateTime.LocalNow()
        ]
    in [Success = false, Data = null, Error = errorInfo]

5. Performance Anti-Patterns

Problem: Inefficient patterns that work for small datasets but don't scale.

Common Anti-Patterns:

Calling Table.FromRecords inside loops
Using Table.AddColumn repeatedly instead of Table.TransformColumns
Not leveraging lazy evaluation properly

Solution:

// Instead of multiple AddColumn calls
step1 = Table.AddColumn(source, "Col1", (row) => Transform1(row[Field1]))
step2 = Table.AddColumn(step1, "Col2", (row) => Transform2(row[Field2]))

// Use TransformColumns or single complex transformation
Table.TransformColumns(source, {
    {"Field1", Transform1, type text},
    {"Field2", Transform2, type number}
})

6. Ignoring Data Type Conversions

Problem: JSON numbers come in as text, dates as strings, causing downstream issues.

Solution: Explicit type conversion with validation:

ConvertJsonTypes = (record as record) =>
[
    ID = try Number.From(record[id]) otherwise null,
    Amount = try Currency.From(record[amount]) otherwise 0,
    Date = try DateTime.From(record[timestamp]) otherwise DateTime.LocalNow(),
    IsActive = try Logical.From(record[active]) otherwise false
]

Summary & Next Steps

You've now explored the sophisticated world of JSON and XML processing in M, moving far beyond basic parsing to handle real-world complexity. The techniques covered—from recursive navigation and namespace handling to performance optimization and error resilience—form the foundation for building robust data integration pipelines.

The key insights to remember:

M treats JSON and XML as native data structures, not text to be parsed
Defensive programming patterns are essential for handling schema variations
Performance optimization requires understanding lazy evaluation and memory management
Comprehensive error handling distinguishes production-ready code from prototypes
Reusable functions and libraries accelerate development and ensure consistency

Your next steps should focus on applying these concepts to your specific data sources. Start by auditing your current JSON and XML processing queries for the anti-patterns discussed in this lesson. Look for opportunities to implement the helper functions and error handling patterns we've developed.

For advanced practitioners, consider exploring M's streaming capabilities for extremely large datasets, building custom connectors for proprietary APIs, and developing automated testing frameworks for your transformation logic. The intersection of M with Azure Data Factory and Power Platform creates opportunities for enterprise-scale data orchestration that builds on these foundational skills.

The semi-structured data landscape continues evolving, with formats like Apache Avro and Protocol Buffers gaining traction alongside JSON and XML. The principles you've learned—understanding data structures, building flexible transformations, and optimizing for scale—will serve you well as new formats emerge.

Advanced JSON and XML Processing in Power Query M Language