Temporal Data Mastery: Writing Queries for Time-Series, Date Ranges, and Slowly Changing Dimensions

Introduction

You're three hours into an incident review. The product team is asking why revenue dropped 18% last Tuesday, and you're staring at a query that keeps returning the wrong numbers. The data is there — you can see the events in the log table — but every aggregation you write either double-counts records or misses a cohort entirely. Sound familiar? Temporal data is where SQL skills separate the practitioners from the experts. Dates look simple until they're not, and "not" happens constantly in production systems.

Time is the hidden dimension in almost every business dataset. Customer records change. Prices get updated. Users churn and return. Metrics need to be compared week-over-week, and someone always wants to know what the data looked like as of a specific point in the past. The queries that answer these questions require a fluency with date arithmetic, window functions, and dimensional modeling that most intermediate SQL practitioners have never fully developed. This lesson closes that gap.

By the end of this lesson, you'll be writing production-quality queries that handle time-series aggregations, gap-filling across irregular intervals, point-in-time lookups, and slowly changing dimensions — the four categories of temporal problems you'll encounter on any serious data team.

What you'll learn:

How to aggregate and analyze time-series data accurately, including gap-filling for missing periods
How to write precise date range queries that avoid off-by-one errors and timezone traps
How to implement and query Slowly Changing Dimensions (SCD) Types 1, 2, and 3
How to perform point-in-time analysis to reconstruct historical state
How to use window functions for period-over-period comparisons and running calculations

Prerequisites

You should already be comfortable with:

Standard SQL aggregations (GROUP BY, HAVING, common aggregate functions)
Basic window functions (ROW_NUMBER, RANK, simple OVER clauses)
JOINs including LEFT JOIN and self-joins
Working with date/timestamp data types in at least one SQL dialect

The examples in this lesson use PostgreSQL syntax. Where behavior differs significantly in BigQuery, Snowflake, or SQL Server, I'll call it out explicitly.

Part 1: Time-Series Aggregations Done Right

The Calendar Spine Pattern

The most common mistake in time-series work is letting your data define your time axis. If no orders came in on a Sunday, your query returns no row for that Sunday — and your line chart shows a false spike the following Monday because the gap is invisible to the downstream tool.

The solution is a calendar spine: a complete, continuous sequence of time periods that you generate independently, then left-join your data onto it.

Here's how to build one inline using a recursive CTE:

WITH RECURSIVE calendar AS (
    SELECT DATE '2024-01-01' AS day
    UNION ALL
    SELECT day + INTERVAL '1 day'
    FROM calendar
    WHERE day < DATE '2024-03-31'
),
daily_revenue AS (
    SELECT
        DATE(created_at) AS order_date,
        SUM(order_total)  AS revenue,
        COUNT(*)          AS order_count
    FROM orders
    WHERE created_at >= '2024-01-01'
      AND created_at <  '2024-04-01'
    GROUP BY DATE(created_at)
)
SELECT
    c.day,
    COALESCE(dr.revenue, 0)      AS revenue,
    COALESCE(dr.order_count, 0)  AS order_count
FROM calendar c
LEFT JOIN daily_revenue dr ON c.day = dr.order_date
ORDER BY c.day;

Notice a few things worth understanding here, not just copying:

Why COALESCE matters: When no orders exist for a day, the LEFT JOIN produces a NULL. COALESCE(dr.revenue, 0) converts that to zero, which is the correct business value — zero revenue, not unknown revenue. Never skip this step.

Why the date range filter in daily_revenue is written with >= and <: This is the half-open interval pattern. Using BETWEEN '2024-01-01' AND '2024-03-31' on a timestamp column will miss everything from 2024-03-31 00:00:01 through 2024-03-31 23:59:59. The half-open pattern >= start AND < end_exclusive is precise regardless of time component.

Tip: In BigQuery, use GENERATE_DATE_ARRAY instead of a recursive CTE: UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-03-31', INTERVAL 1 DAY)) AS day. In Snowflake, use GENERATOR(ROWCOUNT => n) with DATEADD.

Weekly and Monthly Aggregations

Once you have the spine pattern down, aggregating to different granularities is about truncation, not grouping by formatted strings.

-- Monthly revenue with correct truncation
SELECT
    DATE_TRUNC('month', created_at) AS month_start,
    SUM(order_total)                AS revenue
FROM orders
WHERE created_at >= '2023-01-01'
  AND created_at <  '2025-01-01'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month_start;

Why not TO_CHAR(created_at, 'YYYY-MM')? Because you'll lose natural sort order, you'll produce a string instead of a date (breaking downstream tools), and you'll pay a function-call cost on every row. DATE_TRUNC returns the first moment of the period as a proper timestamp, which sorts correctly and integrates with date arithmetic.

For fiscal calendars where months don't align with calendar months, you'll need a date dimension table — a pre-built reference table with columns like fiscal_week, fiscal_quarter, and fiscal_year keyed on calendar date. Generating this inline every time is a trap; build it once and store it.

Part 2: Period-Over-Period Comparisons with Window Functions

The LAG/LEAD Pattern

Comparing a metric to the same period in the prior week or month is one of the most common analytical queries, and it's cleaner with window functions than with self-joins.

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month_start,
        SUM(order_total)                AS revenue
    FROM orders
    WHERE created_at >= '2023-01-01'
      AND created_at <  '2025-01-01'
    GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
    month_start,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month_start)  AS prev_month_revenue,
    revenue - LAG(revenue, 1) OVER (ORDER BY month_start) AS mom_delta,
    ROUND(
        100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month_start))
              / NULLIF(LAG(revenue, 1) OVER (ORDER BY month_start), 0),
        2
    ) AS mom_pct_change
FROM monthly_revenue
ORDER BY month_start;

The NULLIF(..., 0) wrapper on the denominator prevents division-by-zero errors when a prior month had exactly zero revenue. Don't skip it — it will eventually save you from a query that breaks on real data.

Year-Over-Year with the Same-Period Offset

Same-period year-over-year is trickier because LAG(revenue, 12) only works if you have exactly 12 consecutive monthly rows. If you have gaps (maybe your company launched mid-year), the offset is wrong.

A more robust approach uses a self-join on the truncated date:

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', created_at) AS month_start,
        SUM(order_total)                AS revenue
    FROM orders
    GROUP BY 1
)
SELECT
    curr.month_start,
    curr.revenue                             AS current_revenue,
    prev.revenue                             AS prior_year_revenue,
    ROUND(
        100.0 * (curr.revenue - prev.revenue)
              / NULLIF(prev.revenue, 0),
        2
    ) AS yoy_pct
FROM monthly curr
LEFT JOIN monthly prev
    ON curr.month_start = prev.month_start + INTERVAL '1 year'
ORDER BY curr.month_start;

Running Totals and Moving Averages

A 7-day moving average smooths out day-of-week noise and is invaluable for spotting trends. Here's a production-ready version:

WITH daily AS (
    SELECT
        DATE(created_at) AS day,
        SUM(order_total) AS revenue
    FROM orders
    GROUP BY 1
)
SELECT
    day,
    revenue,
    ROUND(
        AVG(revenue) OVER (
            ORDER BY day
            ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
        ),
        2
    ) AS revenue_7d_moving_avg,
    SUM(revenue) OVER (
        ORDER BY day
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_revenue
FROM daily
ORDER BY day;

The ROWS BETWEEN 6 PRECEDING AND CURRENT ROW frame specification gives you a 7-row window (6 prior + current). RANGE BETWEEN uses value-based comparisons and can produce unexpected results when your data has gaps or duplicate values — ROWS BETWEEN is almost always what you actually want for time-series frames.

Warning: If you're computing a moving average but your daily table has gaps (some days missing), your 7-row window will span more than 7 calendar days. Combine with the calendar spine pattern from Part 1 to get accurate rolling windows.

Part 3: Date Range Queries and Interval Arithmetic

The Overlap Problem

Finding records where two date ranges overlap is surprisingly common: scheduling conflicts, subscription period validation, detecting price changes that overlap. Two ranges [A_start, A_end) and [B_start, B_end) overlap if and only if A_start < B_end AND B_start < A_end. This is the Allen interval algebra overlap condition.

-- Find subscription periods that overlap with Q1 2024
SELECT
    customer_id,
    subscription_start,
    subscription_end,
    plan_name
FROM subscriptions
WHERE subscription_start < '2024-04-01'   -- starts before the range ends
  AND subscription_end   > '2024-01-01'   -- ends after the range starts
ORDER BY customer_id, subscription_start;

This query returns any subscription that was active at any point during Q1 2024 — which is what you want for billing calculations, cohort analysis, or churn reporting.

Common mistake: Using BETWEEN for date range overlap. BETWEEN checks containment, not overlap. A subscription that started on Dec 15 and ended Feb 1 overlaps Q1 2024 but would not be returned by WHERE subscription_start BETWEEN '2024-01-01' AND '2024-03-31'.

Counting Active Records at Any Point in Time

A classic dashboard metric is "active subscribers as of date X." Here's how to write it for any arbitrary point in time:

-- Parameterized: active subscriptions as of a specific date
-- Replace :as_of_date with your target date
SELECT
    COUNT(*) AS active_subscribers,
    plan_name,
    COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () AS pct_of_total
FROM subscriptions
WHERE subscription_start <= :as_of_date
  AND (subscription_end IS NULL OR subscription_end > :as_of_date)
GROUP BY plan_name
ORDER BY active_subscribers DESC;

The subscription_end IS NULL condition handles currently-active subscriptions that haven't ended yet — always account for the open-ended interval.

Sessionization: Grouping Events into Sessions

Given a stream of user events, grouping them into "sessions" (a gap of more than 30 minutes ends a session) is a temporal data problem that requires window functions and a bit of creativity.

WITH event_gaps AS (
    SELECT
        user_id,
        event_timestamp,
        LAG(event_timestamp) OVER (
            PARTITION BY user_id
            ORDER BY event_timestamp
        ) AS prev_event_timestamp
    FROM user_events
),
session_boundaries AS (
    SELECT
        user_id,
        event_timestamp,
        -- Flag the start of a new session
        CASE
            WHEN prev_event_timestamp IS NULL THEN 1
            WHEN event_timestamp - prev_event_timestamp > INTERVAL '30 minutes' THEN 1
            ELSE 0
        END AS new_session_flag
    FROM event_gaps
),
sessions_numbered AS (
    SELECT
        user_id,
        event_timestamp,
        SUM(new_session_flag) OVER (
            PARTITION BY user_id
            ORDER BY event_timestamp
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS session_id
    FROM session_boundaries
)
SELECT
    user_id,
    session_id,
    MIN(event_timestamp) AS session_start,
    MAX(event_timestamp) AS session_end,
    COUNT(*)             AS event_count,
    MAX(event_timestamp) - MIN(event_timestamp) AS session_duration
FROM sessions_numbered
GROUP BY user_id, session_id
ORDER BY user_id, session_start;

The trick here is using SUM(new_session_flag) OVER (...) as a running total of session boundaries. Each time a new session starts, the flag is 1, and the running sum increments — giving every event in the same session the same session_id. This is a general pattern worth memorizing: a running sum of flags creates groupings without subqueries or cursors.

Part 4: Slowly Changing Dimensions

What SCDs Actually Are (and Why They're Hard)

A Slowly Changing Dimension is a dimension table — a reference table describing an entity like a customer, product, or employee — where attribute values change over time, but not on every transaction. A customer changes their email address. A product moves to a different price tier. An employee changes departments.

The hard question is: what do you do with historical facts that were recorded when the old value was current? If a customer was in the "Southeast" region when they placed an order in January, but moved to "Northeast" in March, which region gets credit for that January order? The answer depends entirely on which SCD type you've implemented.

SCD Type 1: Overwrite

Type 1 is the simplest: just overwrite the old value. There is no history. Use Type 1 when historical accuracy doesn't matter for the attribute — email addresses, phone numbers, corrected spelling errors.

-- Type 1 update: overwrite customer email
UPDATE customers
SET email        = 'new.email@company.com',
    updated_at   = NOW()
WHERE customer_id = 10047;

Type 1 is operationally simple but analytically dangerous. If your reporting joins orders to customers and a customer changed regions, every historical order will suddenly be attributed to the new region. This is invisible and silent. Know which of your dimensions are Type 1 and document it explicitly.

SCD Type 2: Add a New Row

Type 2 is the gold standard for historical accuracy. When an attribute changes, you:

Close the existing row by setting its valid_to date
Insert a new row with the new value, valid_from = today, valid_to = NULL (or a sentinel date like 9999-12-31)

Here's a typical Type 2 dimension schema:

CREATE TABLE dim_customer (
    customer_key     SERIAL PRIMARY KEY,   -- surrogate key
    customer_id      INT NOT NULL,         -- natural/business key
    full_name        VARCHAR(200),
    region           VARCHAR(100),
    plan_tier        VARCHAR(50),
    valid_from       DATE NOT NULL,
    valid_to         DATE,                 -- NULL means currently active
    is_current       BOOLEAN NOT NULL DEFAULT TRUE
);

And here's the upsert logic to apply a Type 2 change:

BEGIN;

-- Step 1: Close the current record
UPDATE dim_customer
SET valid_to   = CURRENT_DATE - INTERVAL '1 day',
    is_current = FALSE
WHERE customer_id = 10047
  AND is_current = TRUE;

-- Step 2: Insert the new version
INSERT INTO dim_customer (
    customer_id, full_name, region, plan_tier,
    valid_from, valid_to, is_current
)
SELECT
    customer_id,
    full_name,
    'Northeast',   -- the new region value
    plan_tier,
    CURRENT_DATE,
    NULL,
    TRUE
FROM dim_customer
WHERE customer_id = 10047
  AND valid_to = CURRENT_DATE - INTERVAL '1 day';  -- grab the just-closed record for unchanged fields

COMMIT;

Notice the transaction wrapping both statements. You never want to close a record without inserting the replacement — that would leave a gap in history with no current version.

Querying SCD Type 2: The Point-in-Time Join

Here's where most practitioners struggle. Joining a fact table to a Type 2 dimension requires matching the fact's timestamp to the correct historical version of the dimension row.

-- Orders with the region that was active when each order was placed
SELECT
    o.order_id,
    o.created_at,
    o.order_total,
    c.region,
    c.plan_tier,
    c.full_name
FROM orders o
JOIN dim_customer c
    ON o.customer_id     = c.customer_id
    AND o.created_at     >= c.valid_from
    AND (c.valid_to IS NULL OR o.created_at < c.valid_to + INTERVAL '1 day')
ORDER BY o.created_at DESC;

The join condition matches each order to exactly the one dimension version that was active when the order was created. This is why the surrogate key (customer_key) exists — it lets your fact table store a direct FK to a specific version without this complex join. In a well-built warehouse, the ETL process resolves the point-in-time join at load time, and your analytical queries just join on customer_key.

Tip: Using 9999-12-31 as a sentinel valid_to value instead of NULL simplifies your join condition to o.created_at BETWEEN c.valid_from AND c.valid_to and avoids NULL handling. This is especially useful in columnar databases like Redshift and BigQuery where NULL comparisons can interfere with partition pruning.

SCD Type 3: Add a Column

Type 3 stores a limited history by adding columns to the same row:

CREATE TABLE dim_customer_type3 (
    customer_id          INT PRIMARY KEY,
    full_name            VARCHAR(200),
    current_region       VARCHAR(100),
    previous_region      VARCHAR(100),
    region_changed_date  DATE
);

Type 3 is rarely the right choice in a modern warehouse. It only tracks one prior value, it's brittle when the same attribute changes twice, and it doesn't generalize. You'll encounter it in legacy systems. When you do, treat it like a limited version of Type 2 and understand its blindspots.

Part 5: Reconstructing Historical State

Sometimes your production database only keeps current state, and you need to answer historical questions from an audit log or event sourcing table. This is point-in-time reconstruction.

The "As-Of" Query

Assume you have a price_history table tracking every product price change:

CREATE TABLE price_history (
    product_id    INT NOT NULL,
    price         NUMERIC(10,2) NOT NULL,
    effective_at  TIMESTAMP NOT NULL,
    changed_by    VARCHAR(100)
);

To get the active price for every product as of a specific point in time:

-- Price of each product as of 2024-06-15 12:00:00
WITH ranked_prices AS (
    SELECT
        product_id,
        price,
        effective_at,
        ROW_NUMBER() OVER (
            PARTITION BY product_id
            ORDER BY effective_at DESC
        ) AS rn
    FROM price_history
    WHERE effective_at <= '2024-06-15 12:00:00'
)
SELECT
    product_id,
    price,
    effective_at AS price_effective_since
FROM ranked_prices
WHERE rn = 1;

The ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY effective_at DESC) pattern — often called a latest-record-per-group query — is one of the most useful patterns in temporal SQL. For each product, it ranks all historical prices that existed at or before the target timestamp, and the first rank is the most recent one.

Reconstructing Historical Revenue with Point-in-Time Pricing

This is where everything comes together. Imagine you need to recalculate historical revenue at the prices that were in effect when each order was placed — perhaps for an audit or a pricing strategy review.

WITH order_prices AS (
    SELECT
        o.order_id,
        o.ordered_at,
        oi.product_id,
        oi.quantity,
        -- Get the price that was active when the order was placed
        (
            SELECT ph.price
            FROM price_history ph
            WHERE ph.product_id  = oi.product_id
              AND ph.effective_at <= o.ordered_at
            ORDER BY ph.effective_at DESC
            LIMIT 1
        ) AS historical_price
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.ordered_at >= '2024-01-01'
      AND o.ordered_at <  '2025-01-01'
)
SELECT
    DATE_TRUNC('month', ordered_at)         AS month,
    SUM(quantity * historical_price)        AS historical_revenue,
    COUNT(DISTINCT order_id)                AS order_count
FROM order_prices
GROUP BY 1
ORDER BY 1;

This uses a correlated subquery to look up historical price. It's conceptually clean but can be slow on large datasets. In production, replace the correlated subquery with a lateral join (PostgreSQL/BigQuery) or pre-resolve it in a staging model:

-- PostgreSQL: lateral join version (often faster)
SELECT
    o.order_id,
    o.ordered_at,
    oi.product_id,
    oi.quantity,
    ph_latest.price AS historical_price
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN LATERAL (
    SELECT price
    FROM price_history ph
    WHERE ph.product_id   = oi.product_id
      AND ph.effective_at <= o.ordered_at
    ORDER BY ph.effective_at DESC
    LIMIT 1
) ph_latest ON TRUE
WHERE o.ordered_at >= '2024-01-01'
  AND o.ordered_at <  '2025-01-01';

Hands-On Exercise

You have the following three tables in a PostgreSQL database. Build the queries described below.

Schema:

-- Subscriptions table (Type 2 SCD)
CREATE TABLE dim_subscription (
    sub_key         SERIAL PRIMARY KEY,
    customer_id     INT NOT NULL,
    plan_name       VARCHAR(50),
    monthly_price   NUMERIC(8,2),
    valid_from      DATE NOT NULL,
    valid_to        DATE,
    is_current      BOOLEAN NOT NULL DEFAULT TRUE
);

-- Monthly usage events
CREATE TABLE usage_events (
    event_id        SERIAL PRIMARY KEY,
    customer_id     INT NOT NULL,
    event_type      VARCHAR(100),
    occurred_at     TIMESTAMP NOT NULL
);

-- Seed data
INSERT INTO dim_subscription VALUES
(1, 101, 'Starter',  29.00, '2023-01-15', '2023-08-31', FALSE),
(2, 101, 'Growth',   79.00, '2023-09-01', NULL,          TRUE),
(3, 102, 'Starter',  29.00, '2023-03-01', '2024-02-29',  FALSE),
(4, 102, 'Enterprise', 299.00, '2024-03-01', NULL,        TRUE),
(5, 103, 'Growth',   79.00, '2023-06-01', NULL,           TRUE);

INSERT INTO usage_events (customer_id, event_type, occurred_at)
SELECT
    (ARRAY[101, 102, 103])[FLOOR(RANDOM()*3+1)],
    (ARRAY['login','export','api_call','report_view'])[FLOOR(RANDOM()*4+1)],
    '2023-01-01'::TIMESTAMP + (RANDOM() * 700) * INTERVAL '1 day'
FROM GENERATE_SERIES(1, 500);

Your tasks:

Monthly active users with the plan tier they were on — For each month in 2023–2024, count distinct customers who had at least one usage event and show which plan they were on at the time of each event. (Hint: point-in-time join.)
Revenue retention calendar — Build a calendar spine for 2023 and 2024 by month. For each month, calculate total expected MRR based on which subscriptions were active. Show months with zero revenue as 0, not as missing rows.
User session analysis — Using the usage_events table, define a session as a group of events for the same customer with no gap longer than 2 hours. Calculate the average session duration and average events per session for each customer.

Work through each task independently before checking your approach. The sessionization query from Part 3 is directly applicable to task 3.

Common Mistakes & Troubleshooting

Mistake 1: Timezone Confusion Silently Shifting Aggregations

If your created_at timestamps are stored in UTC but your users are in US/Eastern time, DATE_TRUNC('day', created_at) will produce daily buckets in UTC. A sale at 11:30 PM Eastern on Dec 31st shows up in January's data. Always convert to the reporting timezone before truncating:

-- PostgreSQL timezone conversion before aggregation
DATE_TRUNC('day', created_at AT TIME ZONE 'UTC' AT TIME ZONE 'America/New_York')

In BigQuery: DATE(TIMESTAMP_TRUNC(created_at, DAY, 'America/New_York')).

Mistake 2: BETWEEN on Timestamps

BETWEEN '2024-01-01' AND '2024-01-31' on a timestamp column is equivalent to >= '2024-01-01 00:00:00' AND <= '2024-01-31 00:00:00'. Every order placed after midnight on Jan 31st is excluded. Use the half-open interval instead: >= '2024-01-01' AND < '2024-02-01'.

Mistake 3: Assuming No Gaps in LAG/LEAD Offset

If you use LAG(revenue, 12) to get "same month last year" but you have a gap in August because the company was new, every month after that gap will reference the wrong month. Validate your time series is contiguous before using numeric offsets, or use the date-arithmetic self-join approach from Part 2.

Mistake 4: Multiple Active Rows in a Type 2 Dimension

If your ETL has a bug, you may end up with two rows where is_current = TRUE for the same customer_id. This causes fan-out in joins and produces inflated metrics — one of the hardest bugs to diagnose because the row counts look almost right.

Run this data quality check regularly:

-- Detect Type 2 integrity violations
SELECT customer_id, COUNT(*) AS current_row_count
FROM dim_customer
WHERE is_current = TRUE
GROUP BY customer_id
HAVING COUNT(*) > 1;

Mistake 5: NULL valid_to Breaking Range Comparisons

If valid_to IS NULL for active records, a condition like o.created_at BETWEEN c.valid_from AND c.valid_to will never match active records because x BETWEEN a AND NULL evaluates to NULL (unknown), not TRUE. Always handle the NULL case explicitly with OR c.valid_to IS NULL, or use the sentinel date approach.

Debugging a Wrong Count in a Time-Series Query

When your numbers look off, work backwards:

Remove the GROUP BY and inspect raw rows — are there duplicates from a fan-out join?
Add a COUNT(*) alongside COUNT(DISTINCT customer_id) — a mismatch signals fan-out.
Check whether your calendar spine has the right bounds — are the first and last days included?
Verify your timezone before and after truncation with a sample of known records.
For SCD joins, run a sample customer ID manually and inspect all dimension versions side by side.

Summary & Next Steps

You've now worked through the complete landscape of temporal SQL. Let's recap the key mental models:

The calendar spine is your safety net for time-series work. Your data defines what happened; the spine defines when it could have happened. Always left-join data onto the spine, never the reverse.

The half-open interval (>= start AND < end_exclusive) is the right way to express date ranges. It composes cleanly, avoids off-by-one errors with timestamps, and works consistently across granularities.

SCD Type 2 + point-in-time joins are the correct solution when you need historical accuracy in dimensional reporting. The complexity in the ETL is an investment that pays off every time you run a historical analysis and get the right answer instead of a silently wrong one.

The latest-record-per-group pattern (ROW_NUMBER() OVER (PARTITION BY ... ORDER BY timestamp DESC) = 1) generalizes to any "as-of" question across any table with a temporal change log.

The running-sum-of-flags trick from the sessionization example is a general pattern for creating contiguous groupings from event streams without recursive loops.

Next Steps

Practice the SCD Type 2 upsert: Build a small pipeline that reads from a staging table and applies Type 2 logic using a transaction. Handling the "no change" case (don't close a row if nothing actually changed) is an important optimization.
Explore temporal tables in SQL Server 2016+: SQL Server has native SYSTEM_TIME syntax that automates the SCD Type 2 pattern at the database level — it's worth understanding even if you don't use it, because it clarifies the semantics.
Dig into dbt's snapshot feature: If you're using dbt, its snapshot materialization implements SCD Type 2 automatically. Understanding the SQL behind it (which this lesson covers) makes you much more effective at debugging snapshot behavior.
Study window function frame specifications: RANGE BETWEEN, ROWS BETWEEN, and GROUPS BETWEEN have subtly different behaviors. Full mastery of frame specs unlocks a class of analytical queries that otherwise require multiple CTEs.

The goal isn't to memorize syntax — it's to build the mental habit of asking "when was this true?" for every entity in your data model. Once that question becomes automatic, temporal data stops being the hard part.

Temporal Data Mastery: Writing Queries for Time-Series, Date Ranges, and Slowly Changing Dimensions

Temporal Data Mastery: Writing Queries for Time-Series, Date Ranges, and Slowly Changing Dimensions

Introduction

Prerequisites

Part 1: Time-Series Aggregations Done Right

The Calendar Spine Pattern

Weekly and Monthly Aggregations

Part 2: Period-Over-Period Comparisons with Window Functions

The LAG/LEAD Pattern

Year-Over-Year with the Same-Period Offset

Running Totals and Moving Averages

Part 3: Date Range Queries and Interval Arithmetic

The Overlap Problem

Counting Active Records at Any Point in Time

Sessionization: Grouping Events into Sessions

Part 4: Slowly Changing Dimensions

What SCDs Actually Are (and Why They're Hard)

SCD Type 1: Overwrite

SCD Type 2: Add a New Row

Querying SCD Type 2: The Point-in-Time Join

SCD Type 3: Add a Column

Part 5: Reconstructing Historical State

The "As-Of" Query

Reconstructing Historical Revenue with Point-in-Time Pricing

Hands-On Exercise

Common Mistakes & Troubleshooting

Mistake 1: Timezone Confusion Silently Shifting Aggregations

Mistake 2: BETWEEN on Timestamps

Mistake 3: Assuming No Gaps in LAG/LEAD Offset

Mistake 4: Multiple Active Rows in a Type 2 Dimension

Mistake 5: NULL valid_to Breaking Range Comparisons

Debugging a Wrong Count in a Time-Series Query

Summary & Next Steps

Next Steps

Related Articles

Filtering and Transforming Data with SQL String Functions: LIKE, REGEXP, SUBSTRING, and REPLACE

Query Optimization with Materialized Views: Caching Complex Aggregations and Refreshing Strategies for High-Performance Analytics

Lateral Joins and CROSS APPLY: Row-by-Row Subquery Power in SQL

Related Articles

SQL🌱 Foundation
Filtering and Transforming Data with SQL String Functions: LIKE, REGEXP, SUBSTRING, and REPLACE
15 min

SQL🔥 Expert
Query Optimization with Materialized Views: Caching Complex Aggregations and Refreshing Strategies for High-Performance Analytics
29 min

SQL⚡ Practitioner
Lateral Joins and CROSS APPLY: Row-by-Row Subquery Power in SQL
23 min