
You're sitting at your desk when your colleague drops a USB drive on your desk. "The marketing team needs a report from this data," she says, "but it's in some weird format." You plug it in and find files with extensions you've seen before but never worked with: .json and .xml. Your heart sinks a little—these aren't the familiar Excel files or CSV data you're used to.
If this scenario sounds familiar, you're not alone. JSON (JavaScript Object Notation) and XML (eXtensible Markup Language) are everywhere in modern data workflows. APIs return JSON responses, legacy systems export XML files, and web services communicate using both formats. The good news? Power Query's M language has powerful built-in functions that make working with these structured data formats surprisingly straightforward.
By the end of this lesson, you'll confidently transform complex nested JSON and XML data into clean, tabular datasets ready for analysis. You'll understand how these data formats work, why they're structured the way they are, and most importantly, how to extract exactly the information you need using M language functions.
What you'll learn:
This lesson assumes you have basic familiarity with Power Query and the M language. You should be comfortable creating simple queries and understand fundamental M concepts like records, lists, and tables. If you haven't worked with these concepts yet, consider reviewing introductory M language materials first.
Before diving into the M functions, let's understand what we're working with. Both JSON and XML are ways to store structured, hierarchical data—but they approach it differently.
JSON looks like this:
{
"company": "TechCorp Solutions",
"employees": [
{
"id": 1001,
"name": "Sarah Chen",
"department": "Engineering",
"salary": 95000,
"skills": ["Python", "SQL", "Machine Learning"]
},
{
"id": 1002,
"name": "Marcus Rodriguez",
"department": "Marketing",
"salary": 72000,
"skills": ["Digital Marketing", "Analytics", "Content Strategy"]
}
],
"metadata": {
"last_updated": "2024-01-15",
"total_employees": 2
}
}
JSON uses a simple syntax: objects are wrapped in curly braces {}, arrays in square brackets [], and key-value pairs are separated by commas. It's lightweight and easy for both humans and computers to read.
The same data in XML looks like this:
<company name="TechCorp Solutions">
<employees>
<employee id="1001">
<name>Sarah Chen</name>
<department>Engineering</department>
<salary>95000</salary>
<skills>
<skill>Python</skill>
<skill>SQL</skill>
<skill>Machine Learning</skill>
</skills>
</employee>
<employee id="1002">
<name>Marcus Rodriguez</name>
<department>Marketing</department>
<salary>72000</salary>
<skills>
<skill>Digital Marketing</skill>
<skill>Analytics</skill>
<skill>Content Strategy</skill>
</skills>
</employee>
</employees>
<metadata>
<last_updated>2024-01-15</last_updated>
<total_employees>2</total_employees>
</metadata>
</company>
XML is more verbose, using opening and closing tags like HTML. It can store data both in tag content and in attributes (like id="1001"), making it more flexible but also more complex to parse.
Let's start with JSON since it's more common in modern applications. M provides several functions for working with JSON data, with Json.Document() being your primary tool.
When you have a JSON file, you'll typically start by loading it as binary data, then parsing it:
let
Source = File.Contents("C:\Data\employee_data.json"),
JsonData = Json.Document(Source)
in
JsonData
The Json.Document() function takes binary data and converts it into M's native data types:
Once you've parsed JSON data, you navigate it using standard M syntax for records and lists. Let's work with our employee data:
let
Source = File.Contents("C:\Data\employee_data.json"),
JsonData = Json.Document(Source),
CompanyName = JsonData[company],
EmployeeList = JsonData[employees],
FirstEmployee = EmployeeList{0},
FirstEmployeeName = FirstEmployee[name]
in
FirstEmployeeName
This query extracts "Sarah Chen" by navigating through the JSON structure step by step. Each step reveals the hierarchical nature of the data.
The real power comes when you convert JSON arrays into tables for analysis. Here's how to transform our employee array into a proper table:
let
Source = File.Contents("C:\Data\employee_data.json"),
JsonData = Json.Document(Source),
EmployeeList = JsonData[employees],
EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandedEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1", {"id", "name", "department", "salary", "skills"}, {"ID", "Name", "Department", "Salary", "Skills"})
in
ExpandedEmployees
This creates a table where each row represents an employee, but the skills column still contains lists. Let's handle that nested array:
let
Source = File.Contents("C:\Data\employee_data.json"),
JsonData = Json.Document(Source),
EmployeeList = JsonData[employees],
EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandedEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1", {"id", "name", "department", "salary", "skills"}, {"ID", "Name", "Department", "Salary", "Skills"}),
ExpandedSkills = Table.ExpandListColumn(ExpandedEmployees, "Skills")
in
ExpandedSkills
Now each skill becomes its own row, creating a relationship table between employees and their skills.
Pro Tip: When dealing with deeply nested JSON, work incrementally. Parse the outer structure first, examine what you get, then drill down level by level. This makes debugging much easier.
Real-world JSON data often has missing or optional fields. Here's how to handle that gracefully:
let
Source = File.Contents("C:\Data\employee_data.json"),
JsonData = Json.Document(Source),
EmployeeList = JsonData[employees],
EmployeesTable = Table.FromList(EmployeeList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
// Use try-otherwise to handle missing fields
SafeExpandEmployees = Table.ExpandRecordColumn(EmployeesTable, "Column1",
{"id", "name", "department", "salary", "skills", "manager"},
{"ID", "Name", "Department", "Salary", "Skills", "Manager"}),
// Replace null values with defaults
CleanedData = Table.ReplaceValue(SafeExpandEmployees, null, "No Manager", Replacer.ReplaceValue, {"Manager"})
in
CleanedData
XML parsing in M follows a similar pattern but uses different functions. The primary function is Xml.Document(), which converts XML into a nested table structure.
let
Source = File.Contents("C:\Data\employee_data.xml"),
XmlData = Xml.Document(Source)
in
XmlData
When you parse XML with Xml.Document(), you get a table where:
XML navigation requires understanding this table-based structure. Let's extract our company data:
let
Source = File.Contents("C:\Data\employee_data.xml"),
XmlData = Xml.Document(Source),
// Navigate to the company element (root)
CompanyElement = XmlData{0},
CompanyName = CompanyElement[Attributes][name],
// Navigate to employees
CompanyChildren = CompanyElement[Children],
EmployeesElement = Table.SelectRows(CompanyChildren, each [Name] = "employees"){0},
EmployeesChildren = EmployeesElement[Children],
IndividualEmployees = Table.SelectRows(EmployeesChildren, each [Name] = "employee")
in
IndividualEmployees
This gives us a table where each row represents an employee element, but we still need to extract the actual data.
Here's how to transform XML elements into a clean table:
let
Source = File.Contents("C:\Data\employee_data.xml"),
XmlData = Xml.Document(Source),
CompanyElement = XmlData{0},
CompanyChildren = CompanyElement[Children],
EmployeesElement = Table.SelectRows(CompanyChildren, each [Name] = "employees"){0},
EmployeesChildren = EmployeesElement[Children],
IndividualEmployees = Table.SelectRows(EmployeesChildren, each [Name] = "employee"),
// Extract employee data
ExtractEmployeeData = (employee as record) =>
let
EmployeeId = employee[Attributes][id],
EmployeeChildren = employee[Children],
NameElement = Table.SelectRows(EmployeeChildren, each [Name] = "name"){0}[Value],
DeptElement = Table.SelectRows(EmployeeChildren, each [Name] = "department"){0}[Value],
SalaryElement = Table.SelectRows(EmployeeChildren, each [Name] = "salary"){0}[Value],
Result = [ID = EmployeeId, Name = NameElement, Department = DeptElement, Salary = Number.FromText(SalaryElement)]
in
Result,
EmployeeRecords = Table.TransformColumns(IndividualEmployees, {{"Attributes", ExtractEmployeeData}}),
ExpandedEmployees = Table.ExpandRecordColumn(EmployeeRecords, "Attributes", {"ID", "Name", "Department", "Salary"})
in
ExpandedEmployees
XML can store data in both attributes and element content. Here's a pattern for handling both:
let
Source = File.Contents("C:\Data\employee_data.xml"),
XmlData = Xml.Document(Source),
ExtractEmployeeData = (employeeElement as record) =>
let
// Get data from attributes
Attributes = employeeElement[Attributes],
EmployeeId = if Record.HasFields(Attributes, "id") then Attributes[id] else null,
// Get data from child elements
Children = employeeElement[Children],
GetElementValue = (elementName as text) =>
let
Elements = Table.SelectRows(Children, each [Name] = elementName),
Value = if Table.RowCount(Elements) > 0 then Elements{0}[Value] else null
in
Value,
Name = GetElementValue("name"),
Department = GetElementValue("department"),
Salary = GetElementValue("salary")
in
[
ID = EmployeeId,
Name = Name,
Department = Department,
Salary = if Salary <> null then Number.FromText(Salary) else null
]
in
ExtractEmployeeData
Both JSON and XML can contain deeply nested structures. Here are strategies for handling complex hierarchies.
When you have arrays containing objects with different structures, you need flexible extraction methods:
let
Source = Json.Document(File.Contents("C:\Data\mixed_data.json")),
Items = Source[items],
// Create a function to safely extract fields
SafeExtract = (record as record, fieldName as text, defaultValue as any) =>
if Record.HasFields(record, fieldName) then Record.Field(record, fieldName) else defaultValue,
ItemsTable = Table.FromList(Items, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExtractedData = Table.TransformColumns(ItemsTable, {
{"Column1", each [
ID = SafeExtract(_, "id", null),
Type = SafeExtract(_, "type", "unknown"),
Value = SafeExtract(_, "value", 0),
Metadata = SafeExtract(_, "metadata", [])
]}
}),
ExpandedData = Table.ExpandRecordColumn(ExtractedData, "Column1", {"ID", "Type", "Value", "Metadata"})
in
ExpandedData
Some data contains recursive structures, like organizational hierarchies or nested categories:
let
Source = Json.Document(File.Contents("C:\Data\org_chart.json")),
// Recursive function to flatten hierarchy
FlattenHierarchy = (node as record, level as number) as list =>
let
CurrentRecord = [
ID = node[id],
Name = node[name],
Title = node[title],
Level = level
],
Children = if Record.HasFields(node, "reports") then node[reports] else {},
ChildRecords = if List.Count(Children) > 0
then List.Transform(Children, each @FlattenHierarchy(_, level + 1))
else {},
FlatChildren = List.Combine(ChildRecords),
Result = {CurrentRecord} & FlatChildren
in
Result,
FlatList = FlattenHierarchy(Source, 0),
ResultTable = Table.FromList(FlatList, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandedTable = Table.ExpandRecordColumn(ResultTable, "Column1", {"ID", "Name", "Title", "Level"})
in
ExpandedTable
Warning: Be careful with recursive functions in M. They can cause performance issues or stack overflow errors with deeply nested data. Consider iterative approaches for very deep hierarchies.
Let's work through a realistic scenario: processing sales data from an e-commerce API that returns JSON with nested order information.
Here's your sample JSON data (save this as sales_data.json):
{
"orders": [
{
"order_id": "ORD-2024-001",
"customer": {
"id": "CUST-001",
"name": "Alice Johnson",
"email": "alice@email.com"
},
"order_date": "2024-01-15T10:30:00Z",
"items": [
{
"product_id": "PROD-100",
"name": "Wireless Headphones",
"quantity": 2,
"unit_price": 89.99
},
{
"product_id": "PROD-200",
"name": "Phone Case",
"quantity": 1,
"unit_price": 24.99
}
],
"shipping": {
"method": "Express",
"cost": 12.99,
"address": {
"street": "123 Main St",
"city": "Seattle",
"state": "WA",
"zip": "98101"
}
}
}
]
}
Your task: Create three tables:
Solution:
let
Source = Json.Document(File.Contents("C:\Data\sales_data.json")),
Orders = Source[orders],
// 1. Orders Summary
OrdersSummary =
let
OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExpandOrders = Table.TransformColumns(OrdersTable, {
{"Column1", each [
OrderID = [order_id],
CustomerName = [customer][name],
OrderDate = DateTime.FromText([order_date]),
TotalItems = List.Sum(List.Transform([items], each [quantity])),
ShippingCost = [shipping][cost]
]}
}),
FinalOrders = Table.ExpandRecordColumn(ExpandOrders, "Column1",
{"OrderID", "CustomerName", "OrderDate", "TotalItems", "ShippingCost"})
in
FinalOrders,
// 2. Order Items Detail
OrderItems =
let
OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
AddOrderID = Table.TransformColumns(OrdersTable, {
{"Column1", each [OrderID = [order_id], Items = [items]]}
}),
ExpandOrderInfo = Table.ExpandRecordColumn(AddOrderID, "Column1", {"OrderID", "Items"}),
ExpandItems = Table.ExpandListColumn(ExpandOrderInfo, "Items"),
ExpandItemDetails = Table.ExpandRecordColumn(ExpandItems, "Items",
{"product_id", "name", "quantity", "unit_price"},
{"ProductID", "ProductName", "Quantity", "UnitPrice"}),
AddLineTotal = Table.AddColumn(ExpandItemDetails, "LineTotal",
each [Quantity] * [UnitPrice])
in
AddLineTotal,
// 3. Customer Information
CustomerInfo =
let
OrdersTable = Table.FromList(Orders, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
ExtractCustomers = Table.TransformColumns(OrdersTable, {
{"Column1", each [
CustomerID = [customer][id],
Name = [customer][name],
Email = [customer][email],
City = [shipping][address][city],
State = [shipping][address][state]
]}
}),
ExpandCustomers = Table.ExpandRecordColumn(ExtractCustomers, "Column1",
{"CustomerID", "Name", "Email", "City", "State"}),
RemoveDuplicates = Table.Distinct(ExpandCustomers)
in
RemoveDuplicates
in
// Return the table you want to see (change this to see different results)
OrderItems
Try each part of this solution step by step. Start with just loading and parsing the JSON, then build up each table one column at a time.
Problem: "Expression.Error: The column 'fieldname' of the table wasn't found." Solution: Check if the field exists before accessing it:
// Instead of this:
CustomerName = JsonRecord[customer][name]
// Use this:
CustomerName = if Record.HasFields(JsonRecord, "customer")
then JsonRecord[customer][name]
else null
Problem: "DataFormat.Error: Invalid JSON primitive" Solution: Your JSON file might have formatting issues. Check for:
Problem: Can't find XML elements that you know exist. Solution: XML parsing is case-sensitive and namespace-aware. Check:
ns:employee)// For namespaced XML, you might need:
Table.SelectRows(Children, each Text.StartsWith([Name], "employee"))
Problem: Queries take too long with large JSON/XML files. Solution:
Problem: Numbers come through as text, dates aren't recognized. Solution: Be explicit about type conversions:
// Convert text to numbers
Salary = Number.FromText([salary_text])
// Handle dates properly
OrderDate = DateTime.FromText([order_date])
// Handle currency
Price = Currency.FromText(Text.Replace([price_text], "$", ""))
You now have the fundamental skills to work with JSON and XML data sources in M. Here's what you've mastered:
Json.Document() and Xml.Document() are your primary tools for converting text data into M structuresThese skills open up vast amounts of data that were previously inaccessible. Many organizations have years of data trapped in JSON exports, XML backups, or API responses that you can now process confidently.
Next steps to advance your skills:
The combination of M language and these data formats gives you access to virtually any structured data source you'll encounter in modern business environments. Start small, build incrementally, and soon you'll be the person others come to when they have "data in a weird format."
Learning Path: Advanced M Language