Power Query Performance: Master Folding, Buffering & Optimization Techniques

You're working with a 500,000-row sales dataset from your company's SQL Server database. Your Power Query transformation loads quickly during development with a small sample, but when you refresh the full dataset, it takes 45 minutes and sometimes fails entirely. Your manager needs this report updated daily, and you're spending more time waiting for data refreshes than actually analyzing results.

This performance problem isn't uncommon—it's the difference between Power Query operations that can be "folded" back to the data source versus those that must be processed locally in memory. Understanding query folding, buffering behavior, and optimization techniques is essential for building efficient data pipelines that scale from prototype to production.

What you'll learn:

How query folding works and why it dramatically improves performance
When Power Query buffers data in memory and how to control it
Techniques to optimize transformations for maximum folding
How to diagnose performance bottlenecks in your queries
Best practices for structuring efficient Power Query solutions

Prerequisites

This lesson assumes you can create basic Power Query transformations—connecting to data sources, filtering rows, and adding calculated columns. You should also understand SQL basics (SELECT, WHERE, JOIN) since we'll discuss how Power Query translates to database operations.

Understanding Query Folding: The Performance Game-Changer

Query folding is Power Query's ability to translate your transformation steps into native queries that run on the data source itself, rather than bringing all the data into memory first. Think of it like ordering food: instead of asking the restaurant to bring you every ingredient so you can cook at home (no folding), you give them your recipe and they prepare the dish in their kitchen (folding).

When folding works, your database server does the heavy lifting—filtering millions of rows down to thousands before sending data over the network. When folding breaks, Power Query downloads everything and processes it locally, creating massive performance and memory bottlenecks.

How Folding Works Behind the Scenes

Let's trace through a simple example. You connect to a SQL Server table with customer orders and apply these transformations:

1. Filter: OrderDate >= January 1, 2024
2. Select columns: CustomerID, OrderDate, OrderAmount
3. Group by: CustomerID, sum OrderAmount

With successful folding, Power Query generates this SQL:

SELECT CustomerID, SUM(OrderAmount) as OrderAmount
FROM Orders 
WHERE OrderDate >= '2024-01-01'
GROUP BY CustomerID

The database processes this query and returns only the summarized results—maybe 5,000 customer totals instead of 500,000 individual orders.

Without folding, Power Query would execute:

SELECT * FROM Orders

Then download all 500,000 rows to process the filter, selection, and grouping in memory. The performance difference is dramatic—seconds versus minutes or hours.

Checking if Your Query Folds

Power Query provides a simple way to verify folding. In the Power Query Editor, right-click any step and look for "View Native Query." If this option appears and shows SQL code, that step folds successfully. If the option is grayed out, folding has broken at that step.

Let's build a query and check folding at each step:

Source = Sql.Database("localhost", "SalesDB")
OrdersTable = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredRows = Table.SelectRows(OrdersTable, each [OrderDate] >= #date(2024, 1, 1))
SelectedColumns = Table.SelectColumns(FilteredRows, {"CustomerID", "OrderDate", "OrderAmount"})

Right-click the FilteredRows step and select "View Native Query." You should see SQL with a WHERE clause. Do the same for SelectedColumns—the query should include only the three specified columns in the SELECT statement.

When Folding Breaks: The Performance Cliff

Certain operations cannot be translated to the source database, causing folding to break. Once broken, folding doesn't resume in later steps—everything downstream processes locally.

Common Folding Breakers

Custom Functions and Complex Logic

// This breaks folding
AddTaxColumn = Table.AddColumn(SelectedColumns, "TaxAmount", 
    each if [OrderAmount] > 1000 then [OrderAmount] * 0.08 else [OrderAmount] * 0.05)

While simple IF statements sometimes fold, complex conditional logic typically doesn't. The database doesn't understand your custom business rules.

Text Operations with Power Query Functions

// This breaks folding
ExtractDomain = Table.AddColumn(Customers, "EmailDomain", 
    each Text.AfterDelimiter([Email], "@"))

Power Query's text functions don't map directly to SQL string functions across different database systems.

Date Operations Beyond Basic Comparisons

// This might break folding
AddQuarter = Table.AddColumn(FilteredRows, "Quarter", 
    each Date.QuarterOfYear([OrderDate]))

While basic date filtering often folds, extracting specific date parts depends on your database system's capabilities.

Working Around Folding Breaks

When you encounter folding breaks, you have several strategies:

Strategy 1: Move the operation to the source

Instead of using Power Query to extract email domains, create a calculated column or view in your database:

CREATE VIEW CustomersWithDomain AS
SELECT *, SUBSTRING(Email, CHARINDEX('@', Email) + 1, LEN(Email)) as EmailDomain
FROM Customers

Strategy 2: Minimize data before the break

Structure your query to reduce data volume before operations that break folding:

// Good: Filter and select columns first (folds)
FilteredData = Table.SelectRows(Source, each [OrderDate] >= #date(2024, 1, 1))
SelectedColumns = Table.SelectColumns(FilteredData, {"CustomerID", "OrderAmount", "OrderDate"})

// Then apply operations that break folding
AddTaxColumn = Table.AddColumn(SelectedColumns, "TaxAmount", 
    each if [OrderAmount] > 1000 then [OrderAmount] * 0.08 else [OrderAmount] * 0.05)

Understanding Buffering: Memory Management in Power Query

Buffering controls when Power Query loads data into memory for processing. Some operations automatically buffer data, while others stream through it. Understanding buffering helps you manage memory usage and optimize performance.

Automatic Buffering Scenarios

Power Query automatically buffers data when:

Multiple steps reference the same table

Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
CurrentYear = Table.SelectRows(Orders, each Date.Year([OrderDate]) = 2024)
PreviousYear = Table.SelectRows(Orders, each Date.Year([OrderDate]) = 2023)
Combined = Table.Combine({CurrentYear, PreviousYear})

Since both CurrentYear and PreviousYear reference Orders, Power Query buffers the Orders table to avoid querying the database twice.

Operations require multiple passes through data

Certain transformations inherently require buffering:

// Grouping requires seeing all data to calculate aggregates
GroupedData = Table.Group(FilteredRows, {"CustomerID"}, 
    {{"TotalOrders", each Table.RowCount(_), type number}})

// Sorting requires comparing all values
SortedData = Table.Sort(GroupedData, {{"TotalOrders", Order.Descending}})

Explicit Buffering Control

You can explicitly control buffering with Table.Buffer():

// Force buffering - loads entire table into memory
BufferedOrders = Table.Buffer(Orders)

// Subsequent operations on BufferedOrders work from memory
FilteredBuffered = Table.SelectRows(BufferedOrders, each [OrderAmount] > 1000)
GroupedBuffered = Table.Group(FilteredBuffered, {"CustomerID"}, 
    {{"AvgAmount", each List.Average([OrderAmount]), type number}})

Use explicit buffering when you'll perform multiple operations on the same dataset and want to avoid repeated source queries.

When to Avoid Buffering

Buffering consumes memory proportional to your data size. For large datasets, unnecessary buffering can cause:

Out of memory errors during refresh
Slow performance as the system swaps memory to disk
Resource contention with other applications

Avoid buffering when:

Working with very large datasets (millions of rows)
The query can fold efficiently to the source
You're only making a single pass through the data

Optimization Techniques for Maximum Performance

Structure Queries for Maximum Folding

Apply filters as early as possible

// Good: Filter first
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
RecentOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
LargeOrders = Table.SelectRows(RecentOrders, each [OrderAmount] > 500)
SelectedColumns = Table.SelectColumns(LargeOrders, {"CustomerID", "OrderAmount"})

// Bad: Filter after column selection and other operations
Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
SelectedColumns = Table.SelectColumns(Orders, {"CustomerID", "OrderDate", "OrderAmount"})
AddedCustomColumn = Table.AddColumn(SelectedColumns, "Category", each "Large")
FilteredRows = Table.SelectRows(AddedCustomColumn, each [OrderDate] >= #date(2024, 1, 1))

Remove unnecessary columns early

Including columns you don't need increases network traffic and memory usage:

// Select only required columns after initial filtering
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
RequiredColumns = Table.SelectColumns(FilteredOrders, 
    {"OrderID", "CustomerID", "OrderAmount", "OrderDate"})

Optimize Joins and Relationships

Use merge operations that can fold

Simple equi-joins between database tables often fold:

// This typically folds if both tables are from the same database
MergedData = Table.NestedJoin(Orders, {"CustomerID"}, Customers, {"CustomerID"}, 
    "Customer", JoinKind.Inner)
ExpandedData = Table.ExpandTableColumn(MergedData, "Customer", 
    {"CustomerName", "Region"})

Avoid complex join conditions

// This likely breaks folding
ComplexMerge = Table.NestedJoin(Orders, {"CustomerID"}, Customers, {"CustomerID"}, 
    "Customer", JoinKind.Inner, (order, customer) => 
        order[OrderDate] >= customer[FirstOrderDate])

Instead, create the complex join logic in your database as a view or calculated column.

Manage Data Types Efficiently

Use appropriate data types

Power Query often infers inefficient data types. Explicitly set types that match your source:

// Specify efficient types
TypedColumns = Table.TransformColumnTypes(SelectedColumns, {
    {"CustomerID", Int32.Type},
    {"OrderAmount", Currency.Type},
    {"OrderDate", type date}
})

Avoid unnecessary type conversions

Each type conversion can break folding:

// Bad: Converting back and forth
TextCustomerID = Table.TransformColumns(Orders, {{"CustomerID", Text.From}})
NumberCustomerID = Table.TransformColumns(TextCustomerID, {{"CustomerID", Number.From}})

// Good: Use the source data type directly
FilteredOrders = Table.SelectRows(Orders, each [CustomerID] = 12345)

Diagnosing Performance Issues

Using Query Diagnostics

Power Query includes built-in diagnostics to identify performance bottlenecks. In the Power Query Editor, go to Tools → Query Diagnostics → Start Diagnostics, then refresh your query and select Stop Diagnostics.

The diagnostics show:

Time spent on each operation
Which steps query the data source vs. process locally
Memory and CPU usage patterns

Look for:

Steps that take disproportionately long
High memory usage spikes
Frequent data source queries (indicating broken folding)

Analyzing Query Plans

For SQL Server sources, you can analyze the actual SQL generated:

// Add this step to see the final SQL query
NativeQuery = Value.NativeQuery(Source, "
    SELECT CustomerID, SUM(OrderAmount) as TotalAmount
    FROM Orders 
    WHERE OrderDate >= '2024-01-01'
    GROUP BY CustomerID
")

This bypasses Power Query transformations entirely and sends SQL directly to the database. Use this approach for complex operations that don't fold well.

Performance Testing Strategies

Test with realistic data volumes

Don't just test with sample data—use representative volumes:

// Create a test query with row limits during development
TestQuery = Table.FirstN(Orders, 10000)
// Then remove the limit for production

Monitor refresh times

Track query refresh times over different periods to identify performance degradation:

// Add a timestamp column to track refresh performance
RefreshTime = Table.AddColumn(FinalTable, "RefreshTimestamp", each DateTime.LocalNow())

Hands-On Exercise

Let's build an optimized query for a realistic scenario. You have a sales database with these tables:

Orders (2M rows): OrderID, CustomerID, ProductID, OrderDate, Quantity, UnitPrice
Customers (50K rows): CustomerID, CustomerName, Region, FirstOrderDate
Products (5K rows): ProductID, ProductName, Category, ListPrice

Goal: Create a monthly sales report for Q1 2024 showing customer name, region, product category, and total sales.

Step 1: Connect to your data source and establish the base query with optimal filtering:

Source = Sql.Database("your-server", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]

// Filter for Q1 2024 first - this should fold
Q1Orders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1) 
    and [OrderDate] < #date(2024, 4, 1))

// Select only columns needed for the analysis
OrdersFiltered = Table.SelectColumns(Q1Orders, 
    {"CustomerID", "ProductID", "OrderDate", "Quantity", "UnitPrice"})

Right-click the Q1Orders step and verify "View Native Query" shows a WHERE clause with date filters.

Step 2: Add calculated columns that can fold:

// Calculate line total - simple arithmetic often folds
OrdersWithTotal = Table.AddColumn(OrdersFiltered, "LineTotal", 
    each [Quantity] * [UnitPrice], type currency)

Step 3: Join with dimension tables:

// Join with Customers - should fold if both tables are in same database
WithCustomers = Table.NestedJoin(OrdersWithTotal, {"CustomerID"}, 
    Source{[Schema="dbo",Item="Customers"]}[Data], {"CustomerID"}, 
    "Customer", JoinKind.Inner)

ExpandedCustomers = Table.ExpandTableColumn(WithCustomers, "Customer", 
    {"CustomerName", "Region"})

// Join with Products
WithProducts = Table.NestedJoin(ExpandedCustomers, {"ProductID"}, 
    Source{[Schema="dbo",Item="Products"]}[Data], {"ProductID"}, 
    "Product", JoinKind.Inner)

ExpandedProducts = Table.ExpandTableColumn(WithProducts, "Product", 
    {"ProductName", "Category"})

Step 4: Create the monthly summary:

// Add month column - this might break folding
WithMonth = Table.AddColumn(ExpandedProducts, "OrderMonth", 
    each Date.StartOfMonth([OrderDate]), type date)

// Group by month, customer, and category
GroupedData = Table.Group(WithMonth, 
    {"OrderMonth", "CustomerName", "Region", "Category"}, 
    {{"TotalSales", each List.Sum([LineTotal]), type currency},
     {"OrderCount", each Table.RowCount(_), type number}})

// Sort by month and total sales
FinalReport = Table.Sort(GroupedData, 
    {{"OrderMonth", Order.Ascending}, {"TotalSales", Order.Descending}})

Test this query and check folding at each step. The joins should fold, but the month extraction and final grouping likely won't—which is fine since we've reduced the data volume significantly before these operations.

Common Mistakes & Troubleshooting

Mistake 1: Adding Custom Columns Too Early

Problem: Adding calculated columns before filtering breaks folding unnecessarily.

Wrong approach:

Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
WithCategory = Table.AddColumn(Orders, "SizeCategory", 
    each if [OrderAmount] > 1000 then "Large" else "Small")
FilteredOrders = Table.SelectRows(WithCategory, each [OrderDate] >= #date(2024, 1, 1))

Correct approach:

Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))
WithCategory = Table.AddColumn(FilteredOrders, "SizeCategory", 
    each if [OrderAmount] > 1000 then "Large" else "Small")

Mistake 2: Unnecessary Table.Buffer() Usage

Problem: Buffering large tables when folding would work better.

Wrong approach:

Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
BufferedOrders = Table.Buffer(Orders)  // Loads 2M rows into memory
FilteredOrders = Table.SelectRows(BufferedOrders, each [OrderDate] >= #date(2024, 1, 1))

Correct approach:

Source = Sql.Database("localhost", "SalesDB")
Orders = Source{[Schema="dbo",Item="Orders"]}[Data]
FilteredOrders = Table.SelectRows(Orders, each [OrderDate] >= #date(2024, 1, 1))  // Folds to database

Mistake 3: Complex Expressions That Break Folding

Problem: Using Power Query functions instead of database-native operations.

Troubleshooting tip: When folding breaks unexpectedly, simplify expressions step by step to identify the problematic operation.

Debugging Slow Refresh Performance

If your query runs slowly:

Check folding: Verify each step using "View Native Query"
Examine data volumes: Use Table.Profile() to see row counts at each step
Test incremental changes: Comment out steps to isolate performance problems
Monitor resource usage: Check memory and CPU usage during refresh

Warning: Be cautious when testing with production data sources. Large queries can impact database performance for other users. Consider using database snapshots or replicas for development.

Summary & Next Steps

You now understand the three pillars of Power Query performance optimization: query folding pushes operations to the data source, buffering manages memory usage, and optimization techniques maximize efficiency. The key insight is that performance isn't just about writing faster code—it's about structuring queries so the database does the heavy lifting.

Key takeaways:

Always filter and select columns as early as possible to enable folding
Check "View Native Query" to verify folding at critical steps
Use buffering strategically—not automatically
Structure joins and transformations to maintain folding when possible
Test with realistic data volumes during development

Next steps:

Practice these techniques with your own datasets
Learn about incremental refresh for very large datasets
Explore dataflow and dataset optimization in Power BI
Study database performance tuning to optimize the source queries
Investigate whether moving complex logic to database views improves performance

The performance principles you've learned apply beyond Power Query—they're fundamental to any data pipeline that moves and transforms large volumes of information. Master these concepts, and you'll build data solutions that scale from prototype to production without performance surprises.