Working with JSON and XML Data Sources in M

You're sitting at your desk when your colleague drops a USB drive on your desk. "The marketing team needs a report from this data," she says, "but it's in some weird format." You plug it in and find files with extensions you've seen before but never worked with: .json and .xml. Your heart sinks a little—these aren't the familiar Excel files or CSV data you're used to.

If this scenario sounds familiar, you're not alone. JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) are everywhere in modern data workflows. APIs return JSON responses, legacy systems export XML files, and web services communicate using both formats. The good news? Power Query's M language has powerful built-in functions that make working with these structured data formats surprisingly straightforward.

By the end of this lesson, you'll confidently transform complex nested JSON and XML data into clean, tabular datasets ready for analysis. You'll understand how these data formats work, why they're structured the way they are, and most importantly, how to extract exactly the information you need using M language functions.

What you'll learn:

How JSON and XML data structures work and why they're used
Essential M functions for parsing JSON and XML data sources
Techniques for navigating nested data structures and extracting specific values
Methods for converting hierarchical data into flat, analysis-ready tables
Troubleshooting strategies for common parsing errors and data quality issues

Prerequisites

This lesson assumes you have basic familiarity with Power Query and the M language. You should be comfortable creating simple queries and understand fundamental M concepts like records, lists, and tables. If you haven't worked with these concepts yet, consider reviewing introductory M language materials first.

Understanding JSON and XML Data Structures

Before diving into the M functions, let's understand what we're working with. Both JSON and XML are ways to store structured, hierarchical data—but they approach it differently.

JSON: The Modern Standard

JSON looks like this:

{
  "company": "TechCorp Solutions",
  "employees": [
    {
      "id": 1001,
      "name": "Sarah Chen",
      "department": "Engineering",
      "salary": 95000,
      "skills": ["Python", "SQL", "Machine Learning"]
    },
    {
      "id": 1002,
      "name": "Marcus Rodriguez",
      "department": "Marketing",
      "salary": 72000,
      "skills": ["Digital Marketing", "Analytics", "Content Strategy"]
    }
  ],
  "metadata": {
    "last_updated": "2024-01-15",
    "total_employees": 2
  }
}

JSON uses a simple syntax: objects are wrapped in curly braces {}, arrays in square brackets [], and key-value pairs are separated by commas. It's lightweight and easy for both humans and computers to read.

XML: The Verbose Veteran

The same data in XML looks like this:

<company name="TechCorp Solutions">
  <employees>
    <employee id="1001">
      <name>Sarah Chen</name>
      <department>Engineering</department>
      <salary>95000</salary>
      <skills>
        <skill>Python</skill>
        <skill>SQL</skill>
        <skill>Machine Learning</skill>
      </skills>
    </employee>
    <employee id="1002">
      <name>Marcus Rodriguez</name>
      <department>Marketing</department>
      <salary>72000</salary>
      <skills>
        <skill>Digital Marketing</skill>
        <skill>Analytics</skill>
        <skill>Content Strategy</skill>
      </skills>
    </employee>
  </employees>
  <metadata>
    <last_updated>2024-01-15</last_updated>
    <total_employees>2</total_employees>
  </metadata>
</company>

XML is more verbose, using opening and closing tags like HTML. It can store data both in tag content and in attributes (like id="1001"), making it more flexible but also more complex to parse.

Working with JSON Data in M

Let's start with JSON since it's more common in modern applications. M provides several functions for working with JSON data, with Json.Document() being your primary tool.

Loading JSON from Files

When you have a JSON file, you'll typically start by loading it as binary data, then parsing it:

let
    Source = File.Contents("C:\Data\employee_data.json"),
    JsonData = Json.Document(Source)
in
    JsonData

The Json.Document() function takes binary data and converts it into M's native data types:

JSON objects become M records
JSON arrays become M lists
JSON strings, numbers, and booleans remain as their corresponding M types

Navigating JSON Structure

Once you've parsed JSON data, you navigate it using standard M syntax for records and lists. Let's work with our employee data:

let
    Source = File.Contents("C:\Data\employee_data.json"),
    JsonData = Json.Document(Source),
    CompanyName = JsonData[company],
    EmployeeList = JsonData[employees],
    FirstEmployee = EmployeeList{0},
    FirstEmployeeName = FirstEmployee[name]
in
    FirstEmployeeName

This query extracts "Sarah Chen" by navigating through the JSON structure step by step. Each step reveals the hierarchical nature of the data.

Converting JSON Arrays to Tables

The real power comes when you convert JSON arrays into tables for analysis. Here's how to transform our employee array into a proper table:

let
    Source = File.Contents("C:\Data\employee_data.json"),
    JsonData = Json.Document(Source),
    EmployeeList = JsonData[employees],
    EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandedEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1", {"id", "name", "department", "salary", "skills"}, {"ID", "Name", "Department", "Salary", "Skills"})
in
    ExpandedEmployees

This creates a table where each row represents an employee, but the skills column still contains lists. Let's handle that nested array:

let
    Source = File.Contents("C:\Data\employee_data.json"),
    JsonData = Json.Document(Source),
    EmployeeList = JsonData[employees],
    EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandedEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1", {"id", "name", "department", "salary", "skills"}, {"ID", "Name", "Department", "Salary", "Skills"}),
    ExpandedSkills = Table.ExpandListColumn(ExpandedEmployees, "Skills")
in
    ExpandedSkills

Now each skill becomes its own row, creating a relationship table between employees and their skills.

Pro Tip: When dealing with deeply nested JSON, work incrementally. Parse the outer structure first, examine what you get, then drill down level by level. This makes debugging much easier.

Handling Missing or Optional Fields

Real-world JSON data often has missing or optional fields. Here's how to handle that gracefully:

let
    Source = File.Contents("C:\Data\employee_data.json"),
    JsonData = Json.Document(Source),
    EmployeeList = JsonData[employees],
    EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    // Use try-otherwise to handle missing fields
    SafeExpandEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1", 
        {"id", "name", "department", "salary", "skills", "manager"}, 
        {"ID", "Name", "Department", "Salary", "Skills", "Manager"}),
    // Replace null values with defaults
    CleanedData = Table.ReplaceValue(SafeExpandEmployees, null, "No Manager", Replacer.ReplaceValue, {"Manager"})
in
    CleanedData

Working with XML Data in M

XML parsing in M follows a similar pattern but uses different functions. The primary function is Xml.Document(), which converts XML into a nested table structure.

Loading and Parsing XML

let
    Source = File.Contents("C:\Data\employee_data.xml"),
    XmlData = Xml.Document(Source)
in
    XmlData

When you parse XML with Xml.Document(), you get a table where:

Each XML element becomes a row
The element name goes in the "Name" column
Attributes go in the "Attributes" column as a record
Child elements go in the "Children" column as a table
Text content goes in the "Value" column

Navigating XML Structure

XML navigation requires understanding this table-based structure. Let's extract our company data:

let
    Source = File.Contents("C:\Data\employee_data.xml"),
    XmlData = Xml.Document(Source),
    // Navigate to the company element (root)
    CompanyElement = XmlData{0},
    CompanyName = CompanyElement[Attributes][name],
    // Navigate to employees
    CompanyChildren = CompanyElement[Children],
    EmployeesElement = Table.SelectRows(CompanyChildren, each [Name] = "employees"){0},
    EmployeesChildren = EmployeesElement[Children],
    IndividualEmployees = Table.SelectRows(EmployeesChildren, each [Name] = "employee")
in
    IndividualEmployees

This gives us a table where each row represents an employee element, but we still need to extract the actual data.

Converting XML Elements to Tabular Data

Here's how to transform XML elements into a clean table:

let
    Source = File.Contents("C:\Data\employee_data.xml"),
    XmlData = Xml.Document(Source),
    CompanyElement = XmlData{0},
    CompanyChildren = CompanyElement[Children],
    EmployeesElement = Table.SelectRows(CompanyChildren, each [Name] = "employees"){0},
    EmployeesChildren = EmployeesElement[Children],
    IndividualEmployees = Table.SelectRows(EmployeesChildren, each [Name] = "employee"),
    
    // Extract employee data
    ExtractEmployeeData = (employee as record) =>
        let
            EmployeeId = employee[Attributes][id],
            EmployeeChildren = employee[Children],
            NameElement = Table.SelectRows(EmployeeChildren, each [Name] = "name"){0}[Value],
            DeptElement = Table.SelectRows(EmployeeChildren, each [Name] = "department"){0}[Value],
            SalaryElement = Table.SelectRows(EmployeeChildren, each [Name] = "salary"){0}[Value],
            Result = [ID = EmployeeId, Name = NameElement, Department = DeptElement, Salary = Number.FromText(SalaryElement)]
        in
            Result,
    
    EmployeeRecords = Table.TransformColumns(IndividualEmployees, {{"Attributes", ExtractEmployeeData}}),
    ExpandedEmployees = Table.ExpandRecordColumn(EmployeeRecords, "Attributes", {"ID", "Name", "Department", "Salary"})
in
    ExpandedEmployees

Handling XML Attributes vs Elements

XML can store data in both attributes and element content. Here's a pattern for handling both:

let
    Source = File.Contents("C:\Data\employee_data.xml"),
    XmlData = Xml.Document(Source),
    
    ExtractEmployeeData = (employeeElement as record) =>
        let
            // Get data from attributes
            Attributes = employeeElement[Attributes],
            EmployeeId = if Record.HasFields(Attributes, "id") then Attributes[id] else null,
            
            // Get data from child elements
            Children = employeeElement[Children],
            GetElementValue = (elementName as text) =>
                let
                    Elements = Table.SelectRows(Children, each [Name] = elementName),
                    Value = if Table.RowCount(Elements) > 0 then Elements{0}[Value] else null
                in
                    Value,
            
            Name = GetElementValue("name"),
            Department = GetElementValue("department"),
            Salary = GetElementValue("salary")
        in
            [
                ID = EmployeeId,
                Name = Name,
                Department = Department,
                Salary = if Salary <> null then Number.FromText(Salary) else null
            ]
in
    ExtractEmployeeData

Advanced Techniques for Nested Data

Both JSON and XML can contain deeply nested structures. Here are strategies for handling complex hierarchies.

Working with Arrays of Objects

When you have arrays containing objects with different structures, you need flexible extraction methods:

let
    Source = Json.Document(File.Contents("C:\Data\mixed_data.json")),
    Items = Source[items],
    
    // Create a function to safely extract fields
    SafeExtract = (record as record, fieldName as text, defaultValue as any) =>
        if Record.HasFields(record, fieldName) then Record.Field(record, fieldName) else defaultValue,
    
    ItemsTable = Table.FromList(Items, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExtractedData = Table.TransformColumns(ItemsTable, {
        {"Column1", each [
            ID = SafeExtract(_, "id", null),
            Type = SafeExtract(_, "type", "unknown"),
            Value = SafeExtract(_, "value", 0),
            Metadata = SafeExtract(_, "metadata", [])
        ]}
    }),
    ExpandedData = Table.ExpandRecordColumn(ExtractedData, "Column1", {"ID", "Type", "Value", "Metadata"})
in
    ExpandedData

Recursive Data Structures

Some data contains recursive structures, like organizational hierarchies or nested categories:

let
    Source = Json.Document(File.Contents("C:\Data\org_chart.json")),
    
    // Recursive function to flatten hierarchy
    FlattenHierarchy = (node as record, level as number) as list =>
        let
            CurrentRecord = [
                ID = node[id],
                Name = node[name],
                Title = node[title],
                Level = level
            ],
            Children = if Record.HasFields(node, "reports") then node[reports] else {},
            ChildRecords = if List.Count(Children) > 0 
                then List.Transform(Children, each @FlattenHierarchy(_, level + 1))
                else {},
            FlatChildren = List.Combine(ChildRecords),
            Result = {CurrentRecord} & FlatChildren
        in
            Result,
    
    FlatList = FlattenHierarchy(Source, 0),
    ResultTable = Table.FromList(FlatList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    ExpandedTable = Table.ExpandRecordColumn(ResultTable, "Column1", {"ID", "Name", "Title", "Level"})
in
    ExpandedTable

Warning: Be careful with recursive functions in M. They can cause performance issues or stack overflow errors with deeply nested data. Consider iterative approaches for very deep hierarchies.

Hands-On Exercise

Let's work through a realistic scenario: processing sales data from an e-commerce API that returns JSON with nested order information.

Here's your sample JSON data (save this as sales_data.json):

{
  "orders": [
    {
      "order_id": "ORD-2024-001",
      "customer": {
        "id": "CUST-001",
        "name": "Alice Johnson",
        "email": "alice@email.com"
      },
      "order_date": "2024-01-15T10:30:00Z",
      "items": [
        {
          "product_id": "PROD-100",
          "name": "Wireless Headphones",
          "quantity": 2,
          "unit_price": 89.99
        },
        {
          "product_id": "PROD-200",
          "name": "Phone Case",
          "quantity": 1,
          "unit_price": 24.99
        }
      ],
      "shipping": {
        "method": "Express",
        "cost": 12.99,
        "address": {
          "street": "123 Main St",
          "city": "Seattle",
          "state": "WA",
          "zip": "98101"
        }
      }
    }
  ]
}

Your task: Create three tables:

Orders summary (order_id, customer_name, order_date, total_items, shipping_cost)
Order items detail (order_id, product_name, quantity, unit_price, line_total)
Customer information (customer_id, name, email, city, state)

Solution:

let
    Source = Json.Document(File.Contents("C:\Data\sales_data.json")),
    Orders = Source[orders],
    
    // 1. Orders Summary
    OrdersSummary = 
        let
            OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
            ExpandOrders = Table.TransformColumns(OrdersTable, {
                {"Column1", each [
                    OrderID = [order_id],
                    CustomerName = [customer][name],
                    OrderDate = DateTime.FromText([order_date]),
                    TotalItems = List.Sum(List.Transform([items], each [quantity])),
                    ShippingCost = [shipping][cost]
                ]}
            }),
            FinalOrders = Table.ExpandRecordColumn(ExpandOrders, "Column1", 
                {"OrderID", "CustomerName", "OrderDate", "TotalItems", "ShippingCost"})
        in
            FinalOrders,
    
    // 2. Order Items Detail
    OrderItems = 
        let
            OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
            AddOrderID = Table.TransformColumns(OrdersTable, {
                {"Column1", each [OrderID = [order_id], Items = [items]]}
            }),
            ExpandOrderInfo = Table.ExpandRecordColumn(AddOrderID, "Column1", {"OrderID", "Items"}),
            ExpandItems = Table.ExpandListColumn(ExpandOrderInfo, "Items"),
            ExpandItemDetails = Table.ExpandRecordColumn(ExpandItems, "Items", 
                {"product_id", "name", "quantity", "unit_price"}, 
                {"ProductID", "ProductName", "Quantity", "UnitPrice"}),
            AddLineTotal = Table.AddColumn(ExpandItemDetails, "LineTotal", 
                each [Quantity] * [UnitPrice])
        in
            AddLineTotal,
    
    // 3. Customer Information
    CustomerInfo = 
        let
            OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
            ExtractCustomers = Table.TransformColumns(OrdersTable, {
                {"Column1", each [
                    CustomerID = [customer][id],
                    Name = [customer][name],
                    Email = [customer][email],
                    City = [shipping][address][city],
                    State = [shipping][address][state]
                ]}
            }),
            ExpandCustomers = Table.ExpandRecordColumn(ExtractCustomers, "Column1", 
                {"CustomerID", "Name", "Email", "City", "State"}),
            RemoveDuplicates = Table.Distinct(ExpandCustomers)
        in
            RemoveDuplicates
in
    // Return the table you want to see (change this to see different results)
    OrderItems

Try each part of this solution step by step. Start with just loading and parsing the JSON, then build up each table one column at a time.

Common Mistakes & Troubleshooting

JSON Parsing Errors

Problem: "Expression.Error: The column 'fieldname' of the table wasn't found." Solution: Check if the field exists before accessing it:

// Instead of this:
CustomerName = JsonRecord[customer][name]

// Use this:
CustomerName = if Record.HasFields(JsonRecord, "customer") 
    then JsonRecord[customer][name] 
    else null

Problem: "DataFormat.Error: Invalid JSON primitive" Solution: Your JSON file might have formatting issues. Check for:

Missing quotes around strings
Trailing commas
Unescaped special characters
BOM (Byte Order Mark) at file beginning

XML Parsing Issues

Problem: Can't find XML elements that you know exist. Solution: XML parsing is case-sensitive and namespace-aware. Check:

Exact capitalization of element names
XML namespaces (elements might have prefixes like ns:employee)
Whether data is in attributes vs. element content

// For namespaced XML, you might need:
Table.SelectRows(Children, each Text.StartsWith([Name], "employee"))

Performance Problems

Problem: Queries take too long with large JSON/XML files. Solution:

Process data in chunks if possible
Avoid unnecessary expansions—only extract fields you need
Use streaming where available
Consider converting large files to more efficient formats

Data Type Issues

Problem: Numbers come through as text, dates aren't recognized. Solution: Be explicit about type conversions:

// Convert text to numbers
Salary = Number.FromText([salary_text])

// Handle dates properly
OrderDate = DateTime.FromText([order_date])

// Handle currency
Price = Currency.FromText(Text.Replace([price_text], "$", ""))

Summary & Next Steps

You now have the fundamental skills to work with JSON and XML data sources in M. Here's what you've mastered:

Understanding structured data formats: You know how JSON and XML organize hierarchical data and why they're used
Core parsing functions: Json.Document() and Xml.Document() are your primary tools for converting text data into M structures
Navigation techniques: You can drill down through records, lists, and tables to extract specific values
Data transformation patterns: You understand how to flatten nested structures into analysis-ready tables
Error handling strategies: You know how to write robust queries that handle missing or malformed data

These skills open up vast amounts of data that were previously inaccessible. Many organizations have years of data trapped in JSON exports, XML backups, or API responses that you can now process confidently.

Next steps to advance your skills:

Practice with real APIs: Try connecting to public APIs that return JSON data (weather services, social media APIs, financial data)
Learn about streaming: For very large files, investigate M's streaming capabilities
Study schema detection: Explore how Power Query can automatically detect and suggest transformations for your data
Master custom functions: Create reusable functions for common JSON/XML processing patterns in your organization

The combination of M language and these data formats gives you access to virtually any structured data source you'll encounter in modern business environments. Start small, build incrementally, and soon you'll be the person others come to when they have "data in a weird format."

Working with JSON and XML Data Sources in M: Complete Foundation Guide