Wicked Smart Data
LearnArticlesAbout
Sign InSign Up
LearnArticlesAboutContact
Sign InSign Up
Wicked Smart Data

The go-to platform for professionals who want to master data, automation, and AI — from Excel fundamentals to cutting-edge machine learning.

Platform

  • Learning Paths
  • Articles
  • About
  • Contact

Connect

  • Contact Us
  • RSS Feed

© 2026 Wicked Smart Data. All rights reserved.

Privacy PolicyTerms of Service
All Articles
Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking

Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking

Power Query⚡ Practitioner22 min readJun 26, 2026Updated Jun 26, 2026
Table of Contents
  • Introduction
  • Prerequisites
  • Understanding the Core Problem: Why Full Refreshes Break at Scale
  • The Watermark Pattern: Storing State Between Refreshes
  • Building the Incremental Query in Power Query
  • Step 1: Read the Watermark
  • Step 2: Build the Incremental Extract
  • Step 3: Verify the Folded Query
  • Implementing the Merge Strategy
  • Pattern A: Append-Only (New Rows Only)
  • Pattern B: Upsert (New + Updated Rows, No Deletes)
  • Pattern C: Full Change Tracking (New + Updated + Deleted)

Automating Incremental Data Refreshes in Power Query with Persistent State and Change Tracking

Introduction

Picture this: you're maintaining a Power BI report that pulls from a transactional database with 50 million rows. Every morning at 6 AM, your scheduled refresh kicks off and proceeds to re-download all 50 million rows, apply the same transformations it applied yesterday, and rebuild every aggregation from scratch. By 6:45 AM, your gateway is exhausted, your source system's DBA is sending you passive-aggressive emails, and your report still isn't ready for the 7 AM standup. Meanwhile, only 12,000 rows actually changed overnight.

This is the incremental refresh problem, and it's one of the most common sources of wasted compute, refresh timeouts, and general misery in production Power BI environments. Power Query gives you more control over this than most practitioners realize — and when you combine smart watermark logic, persistent state management, and change tracking patterns, you can build refresh pipelines that only process what's actually new or changed. By the end of this lesson, you'll understand how to design and implement those patterns yourself.

What you'll learn:

  • How to design a watermark-based incremental refresh strategy using parameters and named queries in Power Query
  • How to implement persistent state using an external reference table (in Excel, SharePoint, or a database) so your watermarks survive between refreshes
  • How to write change detection logic that identifies new, updated, and deleted records
  • How to merge incremental results into a full dataset without full re-extraction
  • How to handle edge cases like late-arriving data, timezone shifts, and failed refresh recovery

Prerequisites

You should be comfortable with:

  • Writing M code directly in the Advanced Editor
  • Working with parameters in Power Query
  • Query folding basics (you don't need to master it, but you should know what it is)
  • Connecting to at least one external data source (SQL Server, SharePoint, Excel, or similar)
  • Basic understanding of how Power BI scheduled refresh works

Understanding the Core Problem: Why Full Refreshes Break at Scale

Before writing a single line of M, let's be precise about what we're solving. A full refresh has three failure modes at scale:

Volume failure is the most obvious. If your source has 100 million rows, pulling all of them every refresh cycle is simply not sustainable. Network bandwidth, gateway memory, and source system load all become constraints.

Incremental failure is subtler. Even if volume isn't the issue today, growth makes it one eventually. A table that's manageable at 10 million rows becomes a problem at 100 million. Pipelines that don't account for growth fail gradually, then suddenly.

Change complexity failure is the sneakiest. Some datasets don't just grow — they mutate. Rows get updated, deleted, or backdated. A naive "give me everything after the last ID I saw" approach misses updates entirely and may never detect deletes.

The solution architecture has three components working together:

  1. A watermark — a stored value representing the boundary between "already processed" and "needs processing"
  2. An incremental extraction — a query that only fetches data beyond the watermark
  3. A merge strategy — logic that integrates new/changed records into the existing dataset

Power Query handles all three, but it needs help with persistence — and that's where the design gets interesting.


The Watermark Pattern: Storing State Between Refreshes

Power Query itself is stateless. Every time a refresh runs, M code executes from scratch with no memory of previous executions. This is by design — it keeps queries deterministic and reproducible. But incremental refresh requires state: specifically, the answer to "what was the last thing I processed?"

The standard solution is to externalize your state. You store your watermark somewhere Power Query can read it at the start of a refresh and write it at the end. Common storage options are:

Storage Location Read Write Best For
Excel file (SharePoint/OneDrive) Easy Manual or Flow Small teams, low frequency
SharePoint List Easy Power Automate Medium complexity
SQL table Easy Dataflow or stored proc Production pipelines
Azure Blob / Data Lake Easy Power Automate / ADF Enterprise scale

For this lesson, we'll use a SQL table as the persistent store because it's the most production-realistic and gives us transactional safety. The pattern translates directly to the other options.

Here's the watermark table we'll maintain in SQL:

CREATE TABLE dbo.RefreshWatermarks (
    TableName        NVARCHAR(100) PRIMARY KEY,
    LastWatermark    DATETIME2,
    LastRunStatus    NVARCHAR(20),
    LastRunStart     DATETIME2,
    LastRunEnd       DATETIME2
);

-- Seed the initial record
INSERT INTO dbo.RefreshWatermarks VALUES (
    'SalesTransactions',
    '2024-01-01 00:00:00',
    'Success',
    NULL,
    NULL
);

This table does double duty: it stores the watermark value and tracks whether the last run succeeded. That second capability becomes important when we build failure recovery.


Building the Incremental Query in Power Query

Let's set up the actual source data scenario. We're working with a SalesTransactions table that receives roughly 15,000 new rows per day and has updates to existing rows (price corrections, status changes) on maybe 500 rows per day. Here's the source schema:

CREATE TABLE dbo.SalesTransactions (
    TransactionID    INT PRIMARY KEY,
    CustomerID       INT,
    ProductID        INT,
    SaleAmount       DECIMAL(10,2),
    SaleStatus       NVARCHAR(20),
    CreatedAt        DATETIME2,
    ModifiedAt       DATETIME2  -- Updated whenever a row changes
);

The ModifiedAt column is our friend. Any system worth its salt tracks when rows were last modified — if yours doesn't, now is the time to advocate loudly for adding that column.

Step 1: Read the Watermark

Create a new query called WatermarkValue. This query connects to the watermark table and extracts the current boundary:

let
    Source = Sql.Database("your-server.database.windows.net", "YourDatabase"),
    WatermarkTable = Source{[Schema="dbo", Item="RefreshWatermarks"]}[Data],
    FilteredRow = Table.SelectRows(
        WatermarkTable,
        each [TableName] = "SalesTransactions"
    ),
    WatermarkDatetime = FilteredRow{0}[LastWatermark]
in
    WatermarkDatetime

This query returns a single datetime2 value. Disable load for this query (right-click in the Queries pane, uncheck "Enable Load") — it's a helper, not a destination table.

Tip: Always test this query in isolation first. Open the Advanced Editor, confirm you're getting the right scalar value, and check the data type. A type mismatch between your watermark and your filter column is the #1 cause of silent full-table scans.

Step 2: Build the Incremental Extract

Now create the main SalesTransactions_Incremental query. Notice how we reference WatermarkValue directly:

let
    Source = Sql.Database("your-server.database.windows.net", "YourDatabase"),
    TransactionsTable = Source{[Schema="dbo", Item="SalesTransactions"]}[Data],
    
    // Filter to only rows modified after the watermark
    // This captures BOTH new rows and updated rows
    FilteredRows = Table.SelectRows(
        TransactionsTable,
        each [ModifiedAt] > WatermarkValue
    ),
    
    // Apply your standard transformations
    TypedColumns = Table.TransformColumnTypes(
        FilteredRows,
        {
            {"TransactionID", Int64.Type},
            {"CustomerID", Int64.Type},
            {"ProductID", Int64.Type},
            {"SaleAmount", type number},
            {"SaleStatus", type text},
            {"CreatedAt", type datetime},
            {"ModifiedAt", type datetime}
        }
    ),
    
    // Add a processing metadata column
    WithProcessingTimestamp = Table.AddColumn(
        TypedColumns,
        "ProcessedAt",
        each DateTime.LocalNow(),
        type datetime
    )

in
    WithProcessingTimestamp

This is where query folding matters enormously. The Table.SelectRows filter against ModifiedAt needs to fold back to the SQL source as a WHERE clause, not execute in Power Query's engine. If it folds correctly, you're asking the database to give you only the changed rows. If it doesn't fold, Power Query downloads the entire table and filters in memory — exactly what you were trying to avoid.

Warning: Check query folding by right-clicking the last applied step and selecting "View Native Query." If the option is grayed out, folding has broken somewhere in your step chain. Common culprits are custom functions, List.Generate calls, and referencing other queries mid-chain. Keep your filter steps as early as possible.

Step 3: Verify the Folded Query

When folding works correctly, the native query for the filter step should look something like:

SELECT [TransactionID], [CustomerID], [ProductID], [SaleAmount], 
       [SaleStatus], [CreatedAt], [ModifiedAt]
FROM [dbo].[SalesTransactions]
WHERE [ModifiedAt] > '2024-09-15 06:00:00'

That's what efficiency looks like. The database handles the filter, returns only the relevant rows, and Power Query just needs to transform the results.


Implementing the Merge Strategy

An incremental extract is only half the picture. You need to merge those new/changed rows into your existing dataset. There are three main patterns, and choosing between them depends on your data characteristics.

Pattern A: Append-Only (New Rows Only)

If your data is genuinely immutable — events are recorded once and never changed — the merge is just an append. This is the simplest case and the one Power BI's built-in Incremental Refresh feature handles for you. We're going further than that here.

Pattern B: Upsert (New + Updated Rows, No Deletes)

This is the most common real-world case. New rows come in, existing rows sometimes get updated, but nothing is ever deleted. The merge logic:

  1. Take the existing full dataset
  2. Remove any rows whose TransactionID appears in the incremental batch (these are the updated rows)
  3. Append the incremental batch (which contains both the updated versions and the new rows)

In M, assuming you have SalesTransactions_Full as your existing dataset and SalesTransactions_Incremental as the batch:

let
    // Load the existing full dataset
    Existing = SalesTransactions_Full,
    
    // Get the list of IDs that appear in the incremental batch
    IncrementalIDs = List.Buffer(
        Table.Column(SalesTransactions_Incremental, "TransactionID")
    ),
    
    // Remove stale versions of updated rows from existing data
    ExistingWithoutUpdated = Table.SelectRows(
        Existing,
        each not List.Contains(IncrementalIDs, [TransactionID])
    ),
    
    // Append the incremental batch (new rows + updated row versions)
    Merged = Table.Combine({ExistingWithoutUpdated, SalesTransactions_Incremental}),
    
    // Sort to maintain consistent ordering
    Sorted = Table.Sort(Merged, {{"TransactionID", Order.Ascending}})

in
    Sorted

List.Buffer is doing important work here — it materializes the ID list into memory once so the subsequent List.Contains call doesn't re-evaluate the source query for every row in Existing. Without it, you can hit severe performance degradation on large datasets.

Pattern C: Full Change Tracking (New + Updated + Deleted)

Deletes are the hard problem. If a row disappears from the source, your watermark-based query will never see it — you only see what changed, not what vanished. There are two approaches:

Approach 1: Soft deletes. The source system never physically deletes rows; instead it marks them with a IsDeleted flag and updates ModifiedAt. Your incremental query naturally picks these up, and your report-level filters handle excluding them. This is the cleanest solution and worth negotiating with your source system owners.

Approach 2: Periodic full reconciliation. On a schedule (weekly, monthly), you perform a full key comparison between your local dataset and the source. This is expensive but necessary if soft deletes aren't available:

let
    // Pull just the IDs currently in the source (no full data transfer)
    SourceIDs = Table.SelectColumns(
        Sql.Database("your-server", "YourDatabase"){
            [Schema="dbo", Item="SalesTransactions"]
        }[Data],
        {"TransactionID"}
    ),
    
    SourceIDList = List.Buffer(Table.Column(SourceIDs, "TransactionID")),
    
    // Remove rows from local dataset that no longer exist in source
    ExistingReconciled = Table.SelectRows(
        SalesTransactions_Full,
        each List.Contains(SourceIDList, [TransactionID])
    )
    
in
    ExistingReconciled

Warning: The reconciliation query above will not fold — you're comparing across a live source and an in-memory table. This is expensive. Run it on a separate, less-frequent schedule, not every incremental refresh.


Persistent State with External Storage: The Full Implementation

Now let's put persistent state properly in place. The challenge is that Power Query can read state but can't write it — M is a query language, not a scripting language. The write-back step needs to happen outside Power Query.

Here's the production architecture:

[Source DB] --read--> [Power Query: read watermark] 
                   -> [Power Query: extract incremental]
                   -> [Power Query: merge to full dataset]
                   -> [Power BI: load data model]
                   -> [Power Automate: update watermark table]

The Power Automate flow triggers on successful Power BI refresh completion and executes a SQL stored procedure:

CREATE PROCEDURE dbo.UpdateRefreshWatermark
    @TableName       NVARCHAR(100),
    @NewWatermark    DATETIME2,
    @RunStatus       NVARCHAR(20)
AS
BEGIN
    UPDATE dbo.RefreshWatermarks
    SET 
        LastWatermark = @NewWatermark,
        LastRunStatus = @RunStatus,
        LastRunEnd    = SYSUTCDATETIME()
    WHERE TableName = @TableName;
END;

What value do we pass as @NewWatermark? The safest choice is MAX(ModifiedAt) from the incremental batch that was just loaded. You can expose this as a separate query in Power Query:

// Query: NewWatermarkValue
// Disable Load - used by Power Automate via dataset REST API or measure
let
    MaxModified = List.Max(
        Table.Column(SalesTransactions_Incremental, "ModifiedAt")
    ),
    // Fall back to current time if incremental batch was empty
    Result = if MaxModified = null 
             then DateTime.UtcNow() 
             else MaxModified
in
    Result

Power Automate can read this value via the Power BI REST API (Get Dataset In Group / Get Tables) or you can surface it as a card visual that Power Automate reads with the "Get a row" action on a SharePoint list that the report writes to.

Tip: There's a simpler option for teams that don't want a Power Automate dependency: use a Power BI dataflow instead of a Power BI Desktop dataset. Dataflows support output to Azure Data Lake and have direct integration with Azure Data Factory for writeback. If you're already in the Azure ecosystem, this pipeline is cleaner end-to-end.


Handling Edge Cases in Production

Late-Arriving Data

In distributed systems, transactions don't always arrive in order. A sale that happened yesterday might not appear in your source database until today, due to replication lag, ETL pipelines feeding the source, or timezone issues. If your watermark moved past that row's ModifiedAt, you'll never pick it up.

The fix is a lookback window: instead of filtering ModifiedAt > WatermarkValue, filter ModifiedAt > WatermarkValue - #duration(0, 4, 0, 0) (i.e., four hours before the watermark). This overlaps slightly with the previous refresh window, ensuring late arrivals are captured.

// In your SalesTransactions_Incremental query:
LookbackWatermark = WatermarkValue - #duration(0, 4, 0, 0),

FilteredRows = Table.SelectRows(
    TransactionsTable,
    each [ModifiedAt] > LookbackWatermark
)

The lookback means you'll always pull some duplicate rows. That's fine — your upsert merge logic handles duplicates correctly by replacing older versions with newer ones. The only cost is slightly higher row counts in the incremental batch.

Timezone Complications

DATETIME2 in SQL Server is timezone-naive. If your Power Query session runs in UTC but your database stores times in US Eastern, your watermark comparison will be off by 4-5 hours depending on DST. Standardize on UTC everywhere and be explicit about conversions:

// Force UTC comparison by converting local Power Query time to UTC
CurrentUTC = DateTimeZone.UtcNow(),
CurrentUTCNaive = DateTime.From(DateTimeZone.RemoveZone(CurrentUTC)),

// Apply as upper bound if you want to avoid partial-hour data
FilteredRows = Table.SelectRows(
    TransactionsTable,
    each [ModifiedAt] > WatermarkValue 
         and [ModifiedAt] <= CurrentUTCNaive - #duration(0, 1, 0, 0)
)

The upper bound filter (<= now - 1 hour) prevents you from capturing rows that are in-flight — partially committed transactions that might still be changing. It's a safety buffer that the watermark update will then advance past on the next successful run.

Failed Refresh Recovery

If a refresh fails halfway through — say, the connection drops after extracting data but before Power Automate updates the watermark — your next refresh will re-process the same batch. This is called at-least-once processing, and it's the correct failure mode for this architecture. Because your merge logic is an upsert, re-processing the same rows is safe — you'll just end up with the same result.

What you want to avoid is advancing the watermark before confirming success. Never update the watermark at the start of a refresh run. Always update it at the end, after successful load.

The LastRunStatus column in our watermark table helps you build alerting. If Power Automate sees LastRunStatus = 'Failed' persisted from a previous run, it can trigger a notification or retry logic before attempting an update.

-- Power Automate can check this before updating
SELECT LastRunStatus, LastRunStart, LastRunEnd
FROM dbo.RefreshWatermarks
WHERE TableName = 'SalesTransactions'
  AND LastRunStatus = 'Failed'
  AND LastRunEnd < DATEADD(HOUR, -2, SYSUTCDATETIME());
-- Alert if this returns any rows

Real-World Project: Building an Incremental Sales Pipeline

Let's bring everything together in a realistic end-to-end setup. You're building a daily refresh pipeline for a retail analytics report. The source is SQL Server. You want incremental extraction with upsert merging, late-arrival protection, and watermark persistence.

Query 1: `WatermarkValue` (Disable Load)

let
    Source = Sql.Database("sqlprod01.yourcompany.com", "RetailDW"),
    WatermarkTable = Source{[Schema="dbo", Item="RefreshWatermarks"]}[Data],
    SalesRow = Table.SelectRows(
        WatermarkTable, 
        each [TableName] = "SalesTransactions"
    ),
    Watermark = SalesRow{0}[LastWatermark],
    // Apply lookback window for late-arriving data
    LookbackWatermark = Watermark - #duration(0, 4, 0, 0)
in
    LookbackWatermark

Query 2: `SalesTransactions_Delta` (Disable Load)

let
    Source = Sql.Database("sqlprod01.yourcompany.com", "RetailDW"),
    RawTable = Source{[Schema="dbo", Item="SalesTransactions"]}[Data],
    
    // Filter for changed rows - this MUST fold for performance
    ChangedRows = Table.SelectRows(
        RawTable,
        each [ModifiedAt] > WatermarkValue
    ),
    
    // Standard type enforcement
    TypedTable = Table.TransformColumnTypes(ChangedRows, {
        {"TransactionID",   Int64.Type},
        {"CustomerID",      Int64.Type},
        {"ProductID",       Int64.Type},
        {"SaleAmount",      Currency.Type},
        {"SaleStatus",      type text},
        {"CreatedAt",       type datetime},
        {"ModifiedAt",      type datetime},
        {"StoreID",         Int64.Type},
        {"RegionCode",      type text}
    }),
    
    // Enrich with processing metadata
    WithBatchID = Table.AddColumn(
        TypedTable, 
        "BatchID", 
        each Number.From(DateTime.FixedLocalNow()), 
        type number
    )

in
    WithBatchID

Note: DateTime.FixedLocalNow() evaluates once per query execution (unlike DateTime.LocalNow() which can evaluate per row). Using it as a BatchID stamps all rows in this delta with the same value, which is useful for debugging and auditing.

Query 3: `SalesTransactions_Full` (Enable Load → destination table)

let
    // Load the persisted full dataset
    // In production, this would be your Power BI dataflow output 
    // or a staging SQL table maintained by your merge logic
    ExistingDataSource = Sql.Database(
        "sqlprod01.yourcompany.com", "RetailDW"
    ),
    ExistingTable = ExistingDataSource{
        [Schema="dbo", Item="SalesTransactions_Processed"]
    }[Data],
    
    // Buffer the delta IDs to avoid repeated evaluation
    DeltaIDs = List.Buffer(
        Table.Column(SalesTransactions_Delta, "TransactionID")
    ),
    
    // Remove stale versions of rows appearing in this delta
    PrunedExisting = Table.SelectRows(
        ExistingTable,
        each not List.Contains(DeltaIDs, [TransactionID])
    ),
    
    // Combine pruned existing with fresh delta
    MergedTable = Table.Combine({PrunedExisting, SalesTransactions_Delta}),
    
    // Final sort for presentation consistency
    SortedTable = Table.Sort(
        MergedTable, 
        {{"TransactionID", Order.Ascending}}
    )

in
    SortedTable

Query 4: `NewWatermarkValue` (Disable Load — surfaced for Power Automate)

let
    MaxFromDelta = List.Max(
        Table.Column(SalesTransactions_Delta, "ModifiedAt")
    ),
    // If no new data, keep the existing watermark (don't advance into future)
    FinalWatermark = if MaxFromDelta = null 
                     then WatermarkValue 
                     else MaxFromDelta
in
    FinalWatermark

Performance Considerations and When to Use This Approach

This pattern isn't universally appropriate. Here's an honest assessment:

Use this when:

  • Your source table has 5M+ rows and grows by at least 0.5% daily
  • Refresh SLAs are tight (under 30 minutes)
  • Your source system's DBA has asked you nicely (or not so nicely) to stop hammering the server
  • You have a reliable ModifiedAt or equivalent change tracking column

Don't use this when:

  • Your dataset is small (under 500K rows) — the complexity isn't worth it
  • Your source doesn't have a reliable change tracking column — you'll get incorrect results
  • You need real-time data — this is a batch pattern, not a streaming one
  • Your report is used ad-hoc without scheduled refresh — the overhead of state management adds no value

Performance tips:

  • Index the ModifiedAt column on the source table. Without an index, even a perfectly-folded query will result in a full table scan at the database level.
  • Store your persistent state as close to your Power Query gateway as possible. Reading a watermark from a SQL Server on the same network segment as your gateway is fast; reading from an Excel file on a user's OneDrive introduces latency and reliability risk.
  • Monitor delta batch sizes over time. If your delta is consistently pulling 90%+ of the full table, something is wrong — either your watermark isn't advancing, updates are happening on old data en masse, or your lookback window is too wide.

Hands-On Exercise

Build a complete incremental refresh pipeline for a fictional CustomerOrders table. Here's your setup:

Source table definition:

CREATE TABLE dbo.CustomerOrders (
    OrderID      INT PRIMARY KEY,
    CustomerID   INT,
    OrderDate    DATE,
    TotalAmount  DECIMAL(10,2),
    OrderStatus  NVARCHAR(30),
    CreatedAt    DATETIME2 DEFAULT SYSUTCDATETIME(),
    ModifiedAt   DATETIME2 DEFAULT SYSUTCDATETIME()
);

Your tasks:

  1. Create the RefreshWatermarks table and seed it with an initial watermark of 2024-01-01 00:00:00 for CustomerOrders.

  2. Write a WatermarkValue query in Power Query that reads the watermark and applies a 2-hour lookback window.

  3. Write a CustomerOrders_Delta query that:

    • Filters to rows where ModifiedAt > WatermarkValue
    • Types all columns correctly
    • Adds a RefreshTimestamp column using DateTime.FixedLocalNow()
    • Confirms that the filter step folds (use "View Native Query")
  4. Write a CustomerOrders_Full query that merges the delta into an existing base dataset using an upsert pattern. Use List.Buffer on the delta key list.

  5. Write a NewWatermarkValue query that returns MAX(ModifiedAt) from the delta, with a null-safe fallback to the current watermark.

  6. Bonus: Add a record count comparison step that raises an error using error keyword if the delta contains more than 100,000 rows — this acts as a safety check against watermark corruption pulling an unexpectedly large batch:

// Add this check in your delta query before the final output
SafetyCheck = if Table.RowCount(WithBatchID) > 100000 
              then error Error.Record(
                  "ExcessiveDeltaSize",
                  "Delta row count exceeds safety threshold. Check watermark.",
                  [RowCount = Table.RowCount(WithBatchID)]
              )
              else WithBatchID

Common Mistakes and Troubleshooting

Mistake 1: The watermark never advances

Symptom: Every refresh pulls a massive batch because WatermarkValue returns the same old value every time.

Cause: The Power Automate flow isn't triggering, or the stored procedure is failing silently, or you're updating the wrong TableName in the watermark table.

Fix: Query the watermark table directly after a few refresh cycles. If the LastWatermark value isn't changing, add logging to your Power Automate flow and test the stored procedure call manually.

Mistake 2: Query folding breaks on the filter step

Symptom: Refreshes are slow even though the delta should be small. The gateway shows high memory usage.

Cause: Something before your filter step broke the foldable chain. Common culprits: referencing the WatermarkValue query directly causes folding to break because Power Query can't express a cross-query reference as SQL.

Fix: Convert WatermarkValue to a parameter instead of a query reference. Right-click the WatermarkValue query, select "Convert to Parameter." Power Query can fold parameterized filters because it substitutes the literal value directly into the SQL predicate.

Mistake 3: List.Contains performance on large datasets

Symptom: The merge query is extremely slow despite the source data being manageable.

Cause: List.Contains on an unbuffered list re-evaluates the source query for every row check.

Fix: Always wrap the ID list in List.Buffer(). This materializes the list into memory once:

// Slow:
each not List.Contains(Table.Column(Delta, "TransactionID"), [TransactionID])

// Fast:
BufferedIDs = List.Buffer(Table.Column(Delta, "TransactionID")),
// then:
each not List.Contains(BufferedIDs, [TransactionID])

Mistake 4: Watermark moves backward after a failed refresh

Symptom: Data appears to disappear or duplicate after a refresh failure and recovery.

Cause: Watermark update logic ran before data was fully loaded, or the update logic doesn't distinguish success from failure.

Fix: Always update the watermark after a confirmed successful load. Use Power Automate's "Run after" configuration to trigger the update flow only on success of the Power BI refresh action, never on failure or timeout.

Mistake 5: Timezone offset causes phantom data gaps

Symptom: Data from certain hours of the day is always missing or always duplicated.

Cause: A timezone mismatch between the Power Query execution environment (UTC) and the source database (local time).

Fix: Standardize all timestamp comparisons to UTC. Use DateTimeZone.UtcNow() in Power Query and SYSUTCDATETIME() in SQL Server. If the source can't be changed, apply a fixed offset in your watermark comparison:

// Adjust for UTC+5 source if you can't change the source
AdjustedWatermark = WatermarkValue + #duration(0, 5, 0, 0)

Summary and Next Steps

You've built a production-grade incremental refresh pipeline from first principles. The core ideas to carry forward:

  • Power Query is stateless by design. Persistence requires externalizing your watermark to a database table, SharePoint list, or file store. The read happens in M; the write happens in an orchestration layer like Power Automate or ADF.
  • Query folding is not optional for this pattern. If your filter doesn't fold, you're doing a full extraction with in-memory filtering — worse than you started. Always verify with "View Native Query."
  • The upsert merge pattern handles most real-world cases. Filter out stale versions of changed rows, then append the delta. List.Buffer keeps this performant.
  • Defensive design matters. Lookback windows handle late arrivals. Watermark updates-on-success handle failed refresh recovery. Safety checks on delta size prevent watermark corruption from becoming a silent disaster.

Where to go next:

  • Explore Power BI Dataflows as an alternative to Desktop-based incremental refresh. Dataflows have native incremental refresh support and better integration with Azure services for writeback.
  • Learn SQL Server Change Data Capture (CDC) as a more robust alternative to ModifiedAt-based tracking. CDC tracks row-level changes at the database engine level, giving you precise new/updated/deleted signals without relying on application-maintained timestamps.
  • Investigate Power BI's built-in Incremental Refresh and Real-Time Data feature (Premium/PPU), which automates the partition management side of this problem for append-only scenarios, but still benefits from the change detection patterns you've learned here.
  • Study query folding diagnostics in Power Query — the Diagnostics feature (Diagnostics.ActivityIdentifier, Table.View) gives you programmatic access to folding behavior that goes beyond what "View Native Query" shows.

The incremental refresh problem is ultimately a systems design problem dressed up in M code. The patterns you've learned here will serve you not just in Power Query, but in any data pipeline where you're trading compute efficiency against data completeness.

Learning Path: Power Query Essentials

Previous

Power Query Performance: Master Folding, Buffering & Optimization Techniques

Related Articles

Power Query🌱 Foundation

Advanced M: Iterators, Accumulators, and Recursive Patterns

13 min
Power Query🔥 Expert

Building a Reusable Function Library in Power Query

30 min
Power Query⚡ Practitioner

M Language Performance Patterns and Anti-Patterns: Optimize Power Query for Speed

15 min

On this page

  • Introduction
  • Prerequisites
  • Understanding the Core Problem: Why Full Refreshes Break at Scale
  • The Watermark Pattern: Storing State Between Refreshes
  • Building the Incremental Query in Power Query
  • Step 1: Read the Watermark
  • Step 2: Build the Incremental Extract
  • Step 3: Verify the Folded Query
  • Implementing the Merge Strategy
  • Pattern A: Append-Only (New Rows Only)
  • Persistent State with External Storage: The Full Implementation
  • Handling Edge Cases in Production
  • Late-Arriving Data
  • Timezone Complications
  • Failed Refresh Recovery
  • Real-World Project: Building an Incremental Sales Pipeline
  • Query 1: `WatermarkValue` (Disable Load)
  • Query 2: `SalesTransactions_Delta` (Disable Load)
  • Query 3: `SalesTransactions_Full` (Enable Load → destination table)
  • Query 4: `NewWatermarkValue` (Disable Load — surfaced for Power Automate)
  • Performance Considerations and When to Use This Approach
  • Hands-On Exercise
  • Common Mistakes and Troubleshooting
  • Summary and Next Steps
  • Pattern B: Upsert (New + Updated Rows, No Deletes)
  • Pattern C: Full Change Tracking (New + Updated + Deleted)
  • Persistent State with External Storage: The Full Implementation
  • Handling Edge Cases in Production
  • Late-Arriving Data
  • Timezone Complications
  • Failed Refresh Recovery
  • Real-World Project: Building an Incremental Sales Pipeline
  • Query 1: `WatermarkValue` (Disable Load)
  • Query 2: `SalesTransactions_Delta` (Disable Load)
  • Query 3: `SalesTransactions_Full` (Enable Load → destination table)
  • Query 4: `NewWatermarkValue` (Disable Load — surfaced for Power Automate)
  • Performance Considerations and When to Use This Approach
  • Hands-On Exercise
  • Common Mistakes and Troubleshooting
  • Summary and Next Steps