
You're working with a 500,000-row sales dataset from your company's SQL Server database. Your Power Query transformation loads quickly during development with a small sample, but when you refresh the full dataset, it takes 45 minutes and sometimes fails entirely. Your manager needs this report updated daily, and you're spending more time waiting for data refreshes than actually analyzing results.
This performance problem isn't uncommon—it's the difference between Power Query operations that can be "folded" back to the data source versus those that must be processed locally in memory. Understanding query folding, buffering behavior, and optimization techniques is essential for building efficient data pipelines that scale from prototype to production.
What you'll learn:
This lesson assumes you can create basic Power Query transformations—connecting to data sources, filtering rows, and adding calculated columns. You should also understand SQL basics (SELECT, WHERE, JOIN) since we'll discuss how Power Query translates to database operations.
Query folding is Power Query's ability to translate your transformation steps into native queries that run on the data source itself, rather than bringing all the data into memory first. Think of it like ordering food: instead of asking the restaurant to bring you every ingredient so you can cook at home (no folding), you give them your recipe and they prepare the dish in their kitchen (folding).
When folding works, your database server does the heavy lifting—filtering millions of rows down to thousands before sending data over the network. When folding breaks, Power Query downloads everything and processes it locally, creating massive performance and memory bottlenecks.
Let's trace through a simple example. You connect to a SQL Server table with customer orders and apply these transformations:
1. Filter: OrderDate >= January 1, 2024
2. Select columns: CustomerID, OrderDate, OrderAmount
3. Group by: CustomerID, sum OrderAmount
With successful folding, Power Query generates this SQL:
SELECT CustomerID, SUM(OrderAmount) as OrderAmount
FROM Orders
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID
The database processes this query and returns only the summarized results—maybe 5,000 customer totals instead of 500,000 individual orders.
Without folding, Power Query would execute:
SELECT * FROM Orders
Then download all 500,000 rows to process the filter, selection, and grouping in memory. The performance difference is dramatic—seconds versus minutes or hours.
Power Query provides a simple way to verify folding. In the Power Query Editor, right-click any step and look for "View Native Query." If this option appears and shows SQL code, that step folds successfully. If the option is grayed out, folding has broken at that step.
Let's build a query and check folding at each step:
Source = Sql.Database("localhost", "SalesDB")
OrdersTable = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredRows = Table.SelectRows(OrdersTable, each [OrderDate] >= #date(2024, 1, 1))
SelectedColumns = Table.SelectColumns(FilteredRows, {"CustomerID", "OrderDate", "OrderAmount"})
Right-click the FilteredRows step and select "View Native Query." You should see SQL with a WHERE clause. Do the same for SelectedColumns—the query should include only the three specified columns in the SELECT statement.
Certain operations cannot be translated to the source database, causing folding to break. Once broken, folding doesn't resume in later steps—everything downstream processes locally.
Custom Functions and Complex Logic
// This breaks folding
AddTaxColumn = Table.AddColumn(SelectedColumns, "TaxAmount",
each if [OrderAmount] > 1000 then [OrderAmount] * 0.08 else [OrderAmount] * 0.05)
While simple IF statements sometimes fold, complex conditional logic typically doesn't. The database doesn't understand your custom business rules.
Text Operations with Power Query Functions
// This breaks folding
ExtractDomain = Table.AddColumn(Customers, "EmailDomain",
each Text.AfterDelimiter([Email], "@"))
Power Query's text functions don't map directly to SQL string functions across different database systems.
Date Operations Beyond Basic Comparisons
// This might break folding
AddQuarter = Table.AddColumn(FilteredRows, "Quarter",
each Date.QuarterOfYear([OrderDate]))
While basic date filtering often folds, extracting specific date parts depends on your database system's capabilities.
When you encounter folding breaks, you have several strategies:
Strategy 1: Move the operation to the source
Instead of using Power Query to extract email domains, create a calculated column or view in your database:
CREATE VIEW CustomersWithDomain AS
SELECT *, SUBSTRING(Email, CHARINDEX('@', Email) + 1, LEN(Email)) as EmailDomain
FROM Customers
Strategy 2: Minimize data before the break
Structure your query to reduce data volume before operations that break folding:
// Good: Filter and select columns first (folds)
FilteredData = Table.SelectRows(Source, each [OrderDate] >= #date(2024, 1, 1))
SelectedColumns = Table.SelectColumns(FilteredData, {"CustomerID", "OrderAmount", "OrderDate"})
// Then apply operations that break folding
AddTaxColumn = Table.AddColumn(SelectedColumns, "TaxAmount",
each if [OrderAmount] > 1000 then [OrderAmount] * 0.08 else [OrderAmount] * 0.05)
Buffering controls when Power Query loads data into memory for processing. Some operations automatically buffer data, while others stream through it. Understanding buffering helps you manage memory usage and optimize performance.
Power Query automatically buffers data when:
Multiple steps reference the same table
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
CurrentYear = Table.SelectRows(Orders, each Date.Year([OrderDate]) = 2024)
PreviousYear = Table.SelectRows(Orders, each Date.Year([OrderDate]) = 2023)
Combined = Table.Combine({CurrentYear, PreviousYear})
Since both CurrentYear and PreviousYear reference Orders, Power Query buffers the Orders table to avoid querying the database twice.
Operations require multiple passes through data
Certain transformations inherently require buffering:
// Grouping requires seeing all data to calculate aggregates
GroupedData = Table.Group(FilteredRows, {"CustomerID"},
{{"TotalOrders", each Table.RowCount(_), type number}})
// Sorting requires comparing all values
SortedData = Table.Sort(GroupedData, {{"TotalOrders", Order.Descending}})
You can explicitly control buffering with Table.Buffer():
// Force buffering - loads entire table into memory
BufferedOrders = Table.Buffer(Orders)
// Subsequent operations on BufferedOrders work from memory
FilteredBuffered = Table.SelectRows(BufferedOrders, each [OrderAmount] > 1000)
GroupedBuffered = Table.Group(FilteredBuffered, {"CustomerID"},
{{"AvgAmount", each List.Average([OrderAmount]), type number}})
Use explicit buffering when you'll perform multiple operations on the same dataset and want to avoid repeated source queries.
Buffering consumes memory proportional to your data size. For large datasets, unnecessary buffering can cause:
Avoid buffering when:
Apply filters as early as possible
// Good: Filter first
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
RecentOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
LargeOrders = Table.SelectRows(RecentOrders, each [OrderAmount] > 500)
SelectedColumns = Table.SelectColumns(LargeOrders, {"CustomerID", "OrderAmount"})
// Bad: Filter after column selection and other operations
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
SelectedColumns = Table.SelectColumns(Orders, {"CustomerID", "OrderDate", "OrderAmount"})
AddedCustomColumn = Table.AddColumn(SelectedColumns, "Category", each "Large")
FilteredRows = Table.SelectRows(AddedCustomColumn, each [OrderDate] >= #date(2024, 1, 1))
Remove unnecessary columns early
Including columns you don't need increases network traffic and memory usage:
// Select only required columns after initial filtering
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
RequiredColumns = Table.SelectColumns(FilteredOrders,
{"OrderID", "CustomerID", "OrderAmount", "OrderDate"})
Use merge operations that can fold
Simple equi-joins between database tables often fold:
// This typically folds if both tables are from the same database
MergedData = Table.NestedJoin(Orders, {"CustomerID"}, Customers, {"CustomerID"},
"Customer", JoinKind.Inner)
ExpandedData = Table.ExpandTableColumn(MergedData, "Customer",
{"CustomerName", "Region"})
Avoid complex join conditions
// This likely breaks folding
ComplexMerge = Table.NestedJoin(Orders, {"CustomerID"}, Customers, {"CustomerID"},
"Customer", JoinKind.Inner, (order, customer) =>
order[OrderDate] >= customer[FirstOrderDate])
Instead, create the complex join logic in your database as a view or calculated column.
Use appropriate data types
Power Query often infers inefficient data types. Explicitly set types that match your source:
// Specify efficient types
TypedColumns = Table.TransformColumnTypes(SelectedColumns, {
{"CustomerID", Int32.Type},
{"OrderAmount", Currency.Type},
{"OrderDate", type date}
})
Avoid unnecessary type conversions
Each type conversion can break folding:
// Bad: Converting back and forth
TextCustomerID = Table.TransformColumns(Orders, {{"CustomerID", Text.From}})
NumberCustomerID = Table.TransformColumns(TextCustomerID, {{"CustomerID", Number.From}})
// Good: Use the source data type directly
FilteredOrders = Table.SelectRows(Orders, each [CustomerID] = 12345)
Power Query includes built-in diagnostics to identify performance bottlenecks. In the Power Query Editor, go to Tools → Query Diagnostics → Start Diagnostics, then refresh your query and select Stop Diagnostics.
The diagnostics show:
Look for:
For SQL Server sources, you can analyze the actual SQL generated:
// Add this step to see the final SQL query
NativeQuery = Value.NativeQuery(Source, "
SELECT CustomerID, SUM(OrderAmount) as TotalAmount
FROM Orders
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID
")
This bypasses Power Query transformations entirely and sends SQL directly to the database. Use this approach for complex operations that don't fold well.
Test with realistic data volumes
Don't just test with sample data—use representative volumes:
// Create a test query with row limits during development
TestQuery = Table.FirstN(Orders, 10000)
// Then remove the limit for production
Monitor refresh times
Track query refresh times over different periods to identify performance degradation:
// Add a timestamp column to track refresh performance
RefreshTime = Table.AddColumn(FinalTable, "RefreshTimestamp", each DateTime.LocalNow())
Let's build an optimized query for a realistic scenario. You have a sales database with these tables:
Goal: Create a monthly sales report for Q1 2024 showing customer name, region, product category, and total sales.
Step 1: Connect to your data source and establish the base query with optimal filtering:
Source = Sql.Database("your-server", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
// Filter for Q1 2024 first - this should fold
Q1Orders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1)
and [OrderDate] < #date(2024, 4, 1))
// Select only columns needed for the analysis
OrdersFiltered = Table.SelectColumns(Q1Orders,
{"CustomerID", "ProductID", "OrderDate", "Quantity", "UnitPrice"})
Right-click the Q1Orders step and verify "View Native Query" shows a WHERE clause with date filters.
Step 2: Add calculated columns that can fold:
// Calculate line total - simple arithmetic often folds
OrdersWithTotal = Table.AddColumn(OrdersFiltered, "LineTotal",
each [Quantity] * [UnitPrice], type currency)
Step 3: Join with dimension tables:
// Join with Customers - should fold if both tables are in same database
WithCustomers = Table.NestedJoin(OrdersWithTotal, {"CustomerID"},
Source{[Schema="dbo",Item="Customers"]}[Data], {"CustomerID"},
"Customer", JoinKind.Inner)
ExpandedCustomers = Table.ExpandTableColumn(WithCustomers, "Customer",
{"CustomerName", "Region"})
// Join with Products
WithProducts = Table.NestedJoin(ExpandedCustomers, {"ProductID"},
Source{[Schema="dbo",Item="Products"]}[Data], {"ProductID"},
"Product", JoinKind.Inner)
ExpandedProducts = Table.ExpandTableColumn(WithProducts, "Product",
{"ProductName", "Category"})
Step 4: Create the monthly summary:
// Add month column - this might break folding
WithMonth = Table.AddColumn(ExpandedProducts, "OrderMonth",
each Date.StartOfMonth([OrderDate]), type date)
// Group by month, customer, and category
GroupedData = Table.Group(WithMonth,
{"OrderMonth", "CustomerName", "Region", "Category"},
{{"TotalSales", each List.Sum([LineTotal]), type currency},
{"OrderCount", each Table.RowCount(_), type number}})
// Sort by month and total sales
FinalReport = Table.Sort(GroupedData,
{{"OrderMonth", Order.Ascending}, {"TotalSales", Order.Descending}})
Test this query and check folding at each step. The joins should fold, but the month extraction and final grouping likely won't—which is fine since we've reduced the data volume significantly before these operations.
Problem: Adding calculated columns before filtering breaks folding unnecessarily.
Wrong approach:
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
WithCategory = Table.AddColumn(Orders, "SizeCategory",
each if [OrderAmount] > 1000 then "Large" else "Small")
FilteredOrders = Table.SelectRows(WithCategory, each [OrderDate] >= #date(2024, 1, 1))
Correct approach:
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
WithCategory = Table.AddColumn(FilteredOrders, "SizeCategory",
each if [OrderAmount] > 1000 then "Large" else "Small")
Problem: Buffering large tables when folding would work better.
Wrong approach:
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
BufferedOrders = Table.Buffer(Orders) // Loads 2M rows into memory
FilteredOrders = Table.SelectRows(BufferedOrders, each [OrderDate] >= #date(2024, 1, 1))
Correct approach:
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1)) // Folds to database
Problem: Using Power Query functions instead of database-native operations.
Troubleshooting tip: When folding breaks unexpectedly, simplify expressions step by step to identify the problematic operation.
If your query runs slowly:
Warning: Be cautious when testing with production data sources. Large queries can impact database performance for other users. Consider using database snapshots or replicas for development.
You now understand the three pillars of Power Query performance optimization: query folding pushes operations to the data source, buffering manages memory usage, and optimization techniques maximize efficiency. The key insight is that performance isn't just about writing faster code—it's about structuring queries so the database does the heavy lifting.
Key takeaways:
Next steps:
The performance principles you've learned apply beyond Power Query—they're fundamental to any data pipeline that moves and transforms large volumes of information. Master these concepts, and you'll build data solutions that scale from prototype to production without performance surprises.
Learning Path: Power Query Essentials