
You're three hours into an incident review. The product team is asking why revenue dropped 18% last Tuesday, and you're staring at a query that keeps returning the wrong numbers. The data is there — you can see the events in the log table — but every aggregation you write either double-counts records or misses a cohort entirely. Sound familiar? Temporal data is where SQL skills separate the practitioners from the experts. Dates look simple until they're not, and "not" happens constantly in production systems.
Time is the hidden dimension in almost every business dataset. Customer records change. Prices get updated. Users churn and return. Metrics need to be compared week-over-week, and someone always wants to know what the data looked like as of a specific point in the past. The queries that answer these questions require a fluency with date arithmetic, window functions, and dimensional modeling that most intermediate SQL practitioners have never fully developed. This lesson closes that gap.
By the end of this lesson, you'll be writing production-quality queries that handle time-series aggregations, gap-filling across irregular intervals, point-in-time lookups, and slowly changing dimensions — the four categories of temporal problems you'll encounter on any serious data team.
What you'll learn:
You should already be comfortable with:
GROUP BY, HAVING, common aggregate functions)ROW_NUMBER, RANK, simple OVER clauses)The examples in this lesson use PostgreSQL syntax. Where behavior differs significantly in BigQuery, Snowflake, or SQL Server, I'll call it out explicitly.
The most common mistake in time-series work is letting your data define your time axis. If no orders came in on a Sunday, your query returns no row for that Sunday — and your line chart shows a false spike the following Monday because the gap is invisible to the downstream tool.
The solution is a calendar spine: a complete, continuous sequence of time periods that you generate independently, then left-join your data onto it.
Here's how to build one inline using a recursive CTE:
WITH RECURSIVE calendar AS (
SELECT DATE '2024-01-01' AS day
UNION ALL
SELECT day + INTERVAL '1 day'
FROM calendar
WHERE day < DATE '2024-03-31'
),
daily_revenue AS (
SELECT
DATE(created_at) AS order_date,
SUM(order_total) AS revenue,
COUNT(*) AS order_count
FROM orders
WHERE created_at >= '2024-01-01'
AND created_at < '2024-04-01'
GROUP BY DATE(created_at)
)
SELECT
c.day,
COALESCE(dr.revenue, 0) AS revenue,
COALESCE(dr.order_count, 0) AS order_count
FROM calendar c
LEFT JOIN daily_revenue dr ON c.day = dr.order_date
ORDER BY c.day;
Notice a few things worth understanding here, not just copying:
Why COALESCE matters: When no orders exist for a day, the LEFT JOIN produces a NULL. COALESCE(dr.revenue, 0) converts that to zero, which is the correct business value — zero revenue, not unknown revenue. Never skip this step.
Why the date range filter in daily_revenue is written with >= and <: This is the half-open interval pattern. Using BETWEEN '2024-01-01' AND '2024-03-31' on a timestamp column will miss everything from 2024-03-31 00:00:01 through 2024-03-31 23:59:59. The half-open pattern >= start AND < end_exclusive is precise regardless of time component.
Tip: In BigQuery, use
GENERATE_DATE_ARRAYinstead of a recursive CTE:UNNEST(GENERATE_DATE_ARRAY('2024-01-01', '2024-03-31', INTERVAL 1 DAY)) AS day. In Snowflake, useGENERATOR(ROWCOUNT => n)withDATEADD.
Once you have the spine pattern down, aggregating to different granularities is about truncation, not grouping by formatted strings.
-- Monthly revenue with correct truncation
SELECT
DATE_TRUNC('month', created_at) AS month_start,
SUM(order_total) AS revenue
FROM orders
WHERE created_at >= '2023-01-01'
AND created_at < '2025-01-01'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month_start;
Why not TO_CHAR(created_at, 'YYYY-MM')? Because you'll lose natural sort order, you'll produce a string instead of a date (breaking downstream tools), and you'll pay a function-call cost on every row. DATE_TRUNC returns the first moment of the period as a proper timestamp, which sorts correctly and integrates with date arithmetic.
For fiscal calendars where months don't align with calendar months, you'll need a date dimension table — a pre-built reference table with columns like fiscal_week, fiscal_quarter, and fiscal_year keyed on calendar date. Generating this inline every time is a trap; build it once and store it.
Comparing a metric to the same period in the prior week or month is one of the most common analytical queries, and it's cleaner with window functions than with self-joins.
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', created_at) AS month_start,
SUM(order_total) AS revenue
FROM orders
WHERE created_at >= '2023-01-01'
AND created_at < '2025-01-01'
GROUP BY DATE_TRUNC('month', created_at)
)
SELECT
month_start,
revenue,
LAG(revenue, 1) OVER (ORDER BY month_start) AS prev_month_revenue,
revenue - LAG(revenue, 1) OVER (ORDER BY month_start) AS mom_delta,
ROUND(
100.0 * (revenue - LAG(revenue, 1) OVER (ORDER BY month_start))
/ NULLIF(LAG(revenue, 1) OVER (ORDER BY month_start), 0),
2
) AS mom_pct_change
FROM monthly_revenue
ORDER BY month_start;
The NULLIF(..., 0) wrapper on the denominator prevents division-by-zero errors when a prior month had exactly zero revenue. Don't skip it — it will eventually save you from a query that breaks on real data.
Same-period year-over-year is trickier because LAG(revenue, 12) only works if you have exactly 12 consecutive monthly rows. If you have gaps (maybe your company launched mid-year), the offset is wrong.
A more robust approach uses a self-join on the truncated date:
WITH monthly AS (
SELECT
DATE_TRUNC('month', created_at) AS month_start,
SUM(order_total) AS revenue
FROM orders
GROUP BY 1
)
SELECT
curr.month_start,
curr.revenue AS current_revenue,
prev.revenue AS prior_year_revenue,
ROUND(
100.0 * (curr.revenue - prev.revenue)
/ NULLIF(prev.revenue, 0),
2
) AS yoy_pct
FROM monthly curr
LEFT JOIN monthly prev
ON curr.month_start = prev.month_start + INTERVAL '1 year'
ORDER BY curr.month_start;
A 7-day moving average smooths out day-of-week noise and is invaluable for spotting trends. Here's a production-ready version:
WITH daily AS (
SELECT
DATE(created_at) AS day,
SUM(order_total) AS revenue
FROM orders
GROUP BY 1
)
SELECT
day,
revenue,
ROUND(
AVG(revenue) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
),
2
) AS revenue_7d_moving_avg,
SUM(revenue) OVER (
ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily
ORDER BY day;
The ROWS BETWEEN 6 PRECEDING AND CURRENT ROW frame specification gives you a 7-row window (6 prior + current). RANGE BETWEEN uses value-based comparisons and can produce unexpected results when your data has gaps or duplicate values — ROWS BETWEEN is almost always what you actually want for time-series frames.
Warning: If you're computing a moving average but your daily table has gaps (some days missing), your 7-row window will span more than 7 calendar days. Combine with the calendar spine pattern from Part 1 to get accurate rolling windows.
Finding records where two date ranges overlap is surprisingly common: scheduling conflicts, subscription period validation, detecting price changes that overlap. Two ranges [A_start, A_end) and [B_start, B_end) overlap if and only if A_start < B_end AND B_start < A_end. This is the Allen interval algebra overlap condition.
-- Find subscription periods that overlap with Q1 2024
SELECT
customer_id,
subscription_start,
subscription_end,
plan_name
FROM subscriptions
WHERE subscription_start < '2024-04-01' -- starts before the range ends
AND subscription_end > '2024-01-01' -- ends after the range starts
ORDER BY customer_id, subscription_start;
This query returns any subscription that was active at any point during Q1 2024 — which is what you want for billing calculations, cohort analysis, or churn reporting.
Common mistake: Using
BETWEENfor date range overlap.BETWEENchecks containment, not overlap. A subscription that started on Dec 15 and ended Feb 1 overlaps Q1 2024 but would not be returned byWHERE subscription_start BETWEEN '2024-01-01' AND '2024-03-31'.
A classic dashboard metric is "active subscribers as of date X." Here's how to write it for any arbitrary point in time:
-- Parameterized: active subscriptions as of a specific date
-- Replace :as_of_date with your target date
SELECT
COUNT(*) AS active_subscribers,
plan_name,
COUNT(*) * 1.0 / SUM(COUNT(*)) OVER () AS pct_of_total
FROM subscriptions
WHERE subscription_start <= :as_of_date
AND (subscription_end IS NULL OR subscription_end > :as_of_date)
GROUP BY plan_name
ORDER BY active_subscribers DESC;
The subscription_end IS NULL condition handles currently-active subscriptions that haven't ended yet — always account for the open-ended interval.
Given a stream of user events, grouping them into "sessions" (a gap of more than 30 minutes ends a session) is a temporal data problem that requires window functions and a bit of creativity.
WITH event_gaps AS (
SELECT
user_id,
event_timestamp,
LAG(event_timestamp) OVER (
PARTITION BY user_id
ORDER BY event_timestamp
) AS prev_event_timestamp
FROM user_events
),
session_boundaries AS (
SELECT
user_id,
event_timestamp,
-- Flag the start of a new session
CASE
WHEN prev_event_timestamp IS NULL THEN 1
WHEN event_timestamp - prev_event_timestamp > INTERVAL '30 minutes' THEN 1
ELSE 0
END AS new_session_flag
FROM event_gaps
),
sessions_numbered AS (
SELECT
user_id,
event_timestamp,
SUM(new_session_flag) OVER (
PARTITION BY user_id
ORDER BY event_timestamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_id
FROM session_boundaries
)
SELECT
user_id,
session_id,
MIN(event_timestamp) AS session_start,
MAX(event_timestamp) AS session_end,
COUNT(*) AS event_count,
MAX(event_timestamp) - MIN(event_timestamp) AS session_duration
FROM sessions_numbered
GROUP BY user_id, session_id
ORDER BY user_id, session_start;
The trick here is using SUM(new_session_flag) OVER (...) as a running total of session boundaries. Each time a new session starts, the flag is 1, and the running sum increments — giving every event in the same session the same session_id. This is a general pattern worth memorizing: a running sum of flags creates groupings without subqueries or cursors.
A Slowly Changing Dimension is a dimension table — a reference table describing an entity like a customer, product, or employee — where attribute values change over time, but not on every transaction. A customer changes their email address. A product moves to a different price tier. An employee changes departments.
The hard question is: what do you do with historical facts that were recorded when the old value was current? If a customer was in the "Southeast" region when they placed an order in January, but moved to "Northeast" in March, which region gets credit for that January order? The answer depends entirely on which SCD type you've implemented.
Type 1 is the simplest: just overwrite the old value. There is no history. Use Type 1 when historical accuracy doesn't matter for the attribute — email addresses, phone numbers, corrected spelling errors.
-- Type 1 update: overwrite customer email
UPDATE customers
SET email = 'new.email@company.com',
updated_at = NOW()
WHERE customer_id = 10047;
Type 1 is operationally simple but analytically dangerous. If your reporting joins orders to customers and a customer changed regions, every historical order will suddenly be attributed to the new region. This is invisible and silent. Know which of your dimensions are Type 1 and document it explicitly.
Type 2 is the gold standard for historical accuracy. When an attribute changes, you:
valid_to datevalid_from = today, valid_to = NULL (or a sentinel date like 9999-12-31)Here's a typical Type 2 dimension schema:
CREATE TABLE dim_customer (
customer_key SERIAL PRIMARY KEY, -- surrogate key
customer_id INT NOT NULL, -- natural/business key
full_name VARCHAR(200),
region VARCHAR(100),
plan_tier VARCHAR(50),
valid_from DATE NOT NULL,
valid_to DATE, -- NULL means currently active
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
And here's the upsert logic to apply a Type 2 change:
BEGIN;
-- Step 1: Close the current record
UPDATE dim_customer
SET valid_to = CURRENT_DATE - INTERVAL '1 day',
is_current = FALSE
WHERE customer_id = 10047
AND is_current = TRUE;
-- Step 2: Insert the new version
INSERT INTO dim_customer (
customer_id, full_name, region, plan_tier,
valid_from, valid_to, is_current
)
SELECT
customer_id,
full_name,
'Northeast', -- the new region value
plan_tier,
CURRENT_DATE,
NULL,
TRUE
FROM dim_customer
WHERE customer_id = 10047
AND valid_to = CURRENT_DATE - INTERVAL '1 day'; -- grab the just-closed record for unchanged fields
COMMIT;
Notice the transaction wrapping both statements. You never want to close a record without inserting the replacement — that would leave a gap in history with no current version.
Here's where most practitioners struggle. Joining a fact table to a Type 2 dimension requires matching the fact's timestamp to the correct historical version of the dimension row.
-- Orders with the region that was active when each order was placed
SELECT
o.order_id,
o.created_at,
o.order_total,
c.region,
c.plan_tier,
c.full_name
FROM orders o
JOIN dim_customer c
ON o.customer_id = c.customer_id
AND o.created_at >= c.valid_from
AND (c.valid_to IS NULL OR o.created_at < c.valid_to + INTERVAL '1 day')
ORDER BY o.created_at DESC;
The join condition matches each order to exactly the one dimension version that was active when the order was created. This is why the surrogate key (customer_key) exists — it lets your fact table store a direct FK to a specific version without this complex join. In a well-built warehouse, the ETL process resolves the point-in-time join at load time, and your analytical queries just join on customer_key.
Tip: Using
9999-12-31as a sentinelvalid_tovalue instead of NULL simplifies your join condition too.created_at BETWEEN c.valid_from AND c.valid_toand avoids NULL handling. This is especially useful in columnar databases like Redshift and BigQuery where NULL comparisons can interfere with partition pruning.
Type 3 stores a limited history by adding columns to the same row:
CREATE TABLE dim_customer_type3 (
customer_id INT PRIMARY KEY,
full_name VARCHAR(200),
current_region VARCHAR(100),
previous_region VARCHAR(100),
region_changed_date DATE
);
Type 3 is rarely the right choice in a modern warehouse. It only tracks one prior value, it's brittle when the same attribute changes twice, and it doesn't generalize. You'll encounter it in legacy systems. When you do, treat it like a limited version of Type 2 and understand its blindspots.
Sometimes your production database only keeps current state, and you need to answer historical questions from an audit log or event sourcing table. This is point-in-time reconstruction.
Assume you have a price_history table tracking every product price change:
CREATE TABLE price_history (
product_id INT NOT NULL,
price NUMERIC(10,2) NOT NULL,
effective_at TIMESTAMP NOT NULL,
changed_by VARCHAR(100)
);
To get the active price for every product as of a specific point in time:
-- Price of each product as of 2024-06-15 12:00:00
WITH ranked_prices AS (
SELECT
product_id,
price,
effective_at,
ROW_NUMBER() OVER (
PARTITION BY product_id
ORDER BY effective_at DESC
) AS rn
FROM price_history
WHERE effective_at <= '2024-06-15 12:00:00'
)
SELECT
product_id,
price,
effective_at AS price_effective_since
FROM ranked_prices
WHERE rn = 1;
The ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY effective_at DESC) pattern — often called a latest-record-per-group query — is one of the most useful patterns in temporal SQL. For each product, it ranks all historical prices that existed at or before the target timestamp, and the first rank is the most recent one.
This is where everything comes together. Imagine you need to recalculate historical revenue at the prices that were in effect when each order was placed — perhaps for an audit or a pricing strategy review.
WITH order_prices AS (
SELECT
o.order_id,
o.ordered_at,
oi.product_id,
oi.quantity,
-- Get the price that was active when the order was placed
(
SELECT ph.price
FROM price_history ph
WHERE ph.product_id = oi.product_id
AND ph.effective_at <= o.ordered_at
ORDER BY ph.effective_at DESC
LIMIT 1
) AS historical_price
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.ordered_at >= '2024-01-01'
AND o.ordered_at < '2025-01-01'
)
SELECT
DATE_TRUNC('month', ordered_at) AS month,
SUM(quantity * historical_price) AS historical_revenue,
COUNT(DISTINCT order_id) AS order_count
FROM order_prices
GROUP BY 1
ORDER BY 1;
This uses a correlated subquery to look up historical price. It's conceptually clean but can be slow on large datasets. In production, replace the correlated subquery with a lateral join (PostgreSQL/BigQuery) or pre-resolve it in a staging model:
-- PostgreSQL: lateral join version (often faster)
SELECT
o.order_id,
o.ordered_at,
oi.product_id,
oi.quantity,
ph_latest.price AS historical_price
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN LATERAL (
SELECT price
FROM price_history ph
WHERE ph.product_id = oi.product_id
AND ph.effective_at <= o.ordered_at
ORDER BY ph.effective_at DESC
LIMIT 1
) ph_latest ON TRUE
WHERE o.ordered_at >= '2024-01-01'
AND o.ordered_at < '2025-01-01';
You have the following three tables in a PostgreSQL database. Build the queries described below.
Schema:
-- Subscriptions table (Type 2 SCD)
CREATE TABLE dim_subscription (
sub_key SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
plan_name VARCHAR(50),
monthly_price NUMERIC(8,2),
valid_from DATE NOT NULL,
valid_to DATE,
is_current BOOLEAN NOT NULL DEFAULT TRUE
);
-- Monthly usage events
CREATE TABLE usage_events (
event_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
event_type VARCHAR(100),
occurred_at TIMESTAMP NOT NULL
);
-- Seed data
INSERT INTO dim_subscription VALUES
(1, 101, 'Starter', 29.00, '2023-01-15', '2023-08-31', FALSE),
(2, 101, 'Growth', 79.00, '2023-09-01', NULL, TRUE),
(3, 102, 'Starter', 29.00, '2023-03-01', '2024-02-29', FALSE),
(4, 102, 'Enterprise', 299.00, '2024-03-01', NULL, TRUE),
(5, 103, 'Growth', 79.00, '2023-06-01', NULL, TRUE);
INSERT INTO usage_events (customer_id, event_type, occurred_at)
SELECT
(ARRAY[101, 102, 103])[FLOOR(RANDOM()*3+1)],
(ARRAY['login','export','api_call','report_view'])[FLOOR(RANDOM()*4+1)],
'2023-01-01'::TIMESTAMP + (RANDOM() * 700) * INTERVAL '1 day'
FROM GENERATE_SERIES(1, 500);
Your tasks:
Monthly active users with the plan tier they were on — For each month in 2023–2024, count distinct customers who had at least one usage event and show which plan they were on at the time of each event. (Hint: point-in-time join.)
Revenue retention calendar — Build a calendar spine for 2023 and 2024 by month. For each month, calculate total expected MRR based on which subscriptions were active. Show months with zero revenue as 0, not as missing rows.
User session analysis — Using the usage_events table, define a session as a group of events for the same customer with no gap longer than 2 hours. Calculate the average session duration and average events per session for each customer.
Work through each task independently before checking your approach. The sessionization query from Part 3 is directly applicable to task 3.
If your created_at timestamps are stored in UTC but your users are in US/Eastern time, DATE_TRUNC('day', created_at) will produce daily buckets in UTC. A sale at 11:30 PM Eastern on Dec 31st shows up in January's data. Always convert to the reporting timezone before truncating:
-- PostgreSQL timezone conversion before aggregation
DATE_TRUNC('day', created_at AT TIME ZONE 'UTC' AT TIME ZONE 'America/New_York')
In BigQuery: DATE(TIMESTAMP_TRUNC(created_at, DAY, 'America/New_York')).
BETWEEN '2024-01-01' AND '2024-01-31' on a timestamp column is equivalent to >= '2024-01-01 00:00:00' AND <= '2024-01-31 00:00:00'. Every order placed after midnight on Jan 31st is excluded. Use the half-open interval instead: >= '2024-01-01' AND < '2024-02-01'.
If you use LAG(revenue, 12) to get "same month last year" but you have a gap in August because the company was new, every month after that gap will reference the wrong month. Validate your time series is contiguous before using numeric offsets, or use the date-arithmetic self-join approach from Part 2.
If your ETL has a bug, you may end up with two rows where is_current = TRUE for the same customer_id. This causes fan-out in joins and produces inflated metrics — one of the hardest bugs to diagnose because the row counts look almost right.
Run this data quality check regularly:
-- Detect Type 2 integrity violations
SELECT customer_id, COUNT(*) AS current_row_count
FROM dim_customer
WHERE is_current = TRUE
GROUP BY customer_id
HAVING COUNT(*) > 1;
If valid_to IS NULL for active records, a condition like o.created_at BETWEEN c.valid_from AND c.valid_to will never match active records because x BETWEEN a AND NULL evaluates to NULL (unknown), not TRUE. Always handle the NULL case explicitly with OR c.valid_to IS NULL, or use the sentinel date approach.
When your numbers look off, work backwards:
GROUP BY and inspect raw rows — are there duplicates from a fan-out join?COUNT(*) alongside COUNT(DISTINCT customer_id) — a mismatch signals fan-out.You've now worked through the complete landscape of temporal SQL. Let's recap the key mental models:
The calendar spine is your safety net for time-series work. Your data defines what happened; the spine defines when it could have happened. Always left-join data onto the spine, never the reverse.
The half-open interval (>= start AND < end_exclusive) is the right way to express date ranges. It composes cleanly, avoids off-by-one errors with timestamps, and works consistently across granularities.
SCD Type 2 + point-in-time joins are the correct solution when you need historical accuracy in dimensional reporting. The complexity in the ETL is an investment that pays off every time you run a historical analysis and get the right answer instead of a silently wrong one.
The latest-record-per-group pattern (ROW_NUMBER() OVER (PARTITION BY ... ORDER BY timestamp DESC) = 1) generalizes to any "as-of" question across any table with a temporal change log.
The running-sum-of-flags trick from the sessionization example is a general pattern for creating contiguous groupings from event streams without recursive loops.
SYSTEM_TIME syntax that automates the SCD Type 2 pattern at the database level — it's worth understanding even if you don't use it, because it clarifies the semantics.snapshot materialization implements SCD Type 2 automatically. Understanding the SQL behind it (which this lesson covers) makes you much more effective at debugging snapshot behavior.RANGE BETWEEN, ROWS BETWEEN, and GROUPS BETWEEN have subtly different behaviors. Full mastery of frame specs unlocks a class of analytical queries that otherwise require multiple CTEs.The goal isn't to memorize syntax — it's to build the mental habit of asking "when was this true?" for every entity in your data model. Once that question becomes automatic, temporal data stops being the hard part.
Learning Path: Advanced SQL Queries