
You're staring at a SQL query that spans three screens, has nested subqueries five levels deep, and includes the same complex logic repeated in multiple places. Your colleague asks for a "quick modification" to add another metric, and you realize you'll need to scroll through 150 lines of barely comprehensible code to figure out where to make the change. Sound familiar?
This is exactly the problem Common Table Expressions (CTEs) were designed to solve. CTEs let you break complex queries into readable, modular pieces—think of them as temporary named result sets that exist only for the duration of your query. Instead of cramming everything into nested subqueries, you can define intermediate steps with meaningful names, making your SQL as readable as a well-structured program.
By the end of this lesson, you'll be writing SQL that your future self (and your teammates) will actually understand six months from now.
What you'll learn:
You should be comfortable with:
We'll use PostgreSQL syntax throughout, but CTEs work similarly in SQL Server, MySQL 8.0+, and other modern databases.
Let's start with a realistic scenario. You're analyzing customer purchase behavior and need to identify high-value customers who've made repeat purchases in the last quarter. Here's how this might look with traditional subqueries:
-- The nested subquery approach (don't do this)
SELECT
customer_id,
customer_name,
total_orders,
total_revenue,
avg_order_value
FROM (
SELECT
c.customer_id,
c.customer_name,
COUNT(o.order_id) as total_orders,
SUM(o.order_total) as total_revenue,
AVG(o.order_total) as avg_order_value
FROM customers c
INNER JOIN (
SELECT
customer_id,
order_id,
order_total,
order_date
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '3 months'
AND order_total > 0
) o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(o.order_id) >= 2
) customer_summary
WHERE total_revenue > 1000
ORDER BY total_revenue DESC;
This works, but it's hard to follow the logic flow. Now let's refactor using CTEs:
-- The CTE approach (much cleaner)
WITH recent_orders AS (
SELECT
customer_id,
order_id,
order_total,
order_date
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '3 months'
AND order_total > 0
),
customer_metrics AS (
SELECT
c.customer_id,
c.customer_name,
COUNT(ro.order_id) as total_orders,
SUM(ro.order_total) as total_revenue,
AVG(ro.order_total) as avg_order_value
FROM customers c
INNER JOIN recent_orders ro ON c.customer_id = ro.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(ro.order_id) >= 2
)
SELECT
customer_id,
customer_name,
total_orders,
total_revenue,
avg_order_value
FROM customer_metrics
WHERE total_revenue > 1000
ORDER BY total_revenue DESC;
The CTE version tells a clear story: first we define what "recent orders" means, then we calculate customer metrics based on those orders, and finally we filter for high-value customers. Each step has a meaningful name and serves a single purpose.
Here's what the output might look like:
| customer_id | customer_name | total_orders | total_revenue | avg_order_value |
|---|---|---|---|---|
| 1247 | TechCorp Inc | 8 | 12450.00 | 1556.25 |
| 3891 | DataFlow LLC | 5 | 8750.50 | 1750.10 |
| 2156 | CloudBiz Co | 12 | 6890.25 | 574.19 |
The basic CTE syntax follows this pattern:
WITH cte_name AS (
-- Your query here
),
another_cte AS (
-- This can reference the first CTE
SELECT * FROM cte_name WHERE condition
)
-- Main query that can reference any CTE
SELECT * FROM another_cte;
Key points about CTE syntax:
WITH followed by your CTE nameWITH keywords)Let's see this in action with a more complex example involving sales territory analysis:
WITH regional_sales AS (
-- Step 1: Aggregate sales by region and salesperson
SELECT
s.region,
s.salesperson_id,
s.salesperson_name,
SUM(o.order_total) as total_sales,
COUNT(DISTINCT o.customer_id) as unique_customers
FROM salespeople s
INNER JOIN orders o ON s.salesperson_id = o.salesperson_id
WHERE o.order_date >= '2024-01-01'
GROUP BY s.region, s.salesperson_id, s.salesperson_name
),
region_benchmarks AS (
-- Step 2: Calculate regional averages for comparison
SELECT
region,
AVG(total_sales) as avg_sales_per_person,
AVG(unique_customers) as avg_customers_per_person
FROM regional_sales
GROUP BY region
),
performance_analysis AS (
-- Step 3: Compare each salesperson to their regional average
SELECT
rs.region,
rs.salesperson_name,
rs.total_sales,
rs.unique_customers,
rb.avg_sales_per_person,
rb.avg_customers_per_person,
ROUND(
(rs.total_sales / rb.avg_sales_per_person - 1) * 100,
1
) as sales_vs_regional_avg_pct,
ROUND(
(rs.unique_customers::DECIMAL / rb.avg_customers_per_person - 1) * 100,
1
) as customers_vs_regional_avg_pct
FROM regional_sales rs
INNER JOIN region_benchmarks rb ON rs.region = rb.region
)
-- Final query: Show top and bottom performers
SELECT
region,
salesperson_name,
total_sales,
unique_customers,
sales_vs_regional_avg_pct,
customers_vs_regional_avg_pct,
CASE
WHEN sales_vs_regional_avg_pct > 20 THEN 'Top Performer'
WHEN sales_vs_regional_avg_pct < -20 THEN 'Needs Support'
ELSE 'Average Performer'
END as performance_category
FROM performance_analysis
ORDER BY region, sales_vs_regional_avg_pct DESC;
This query would produce output like:
| region | salesperson_name | total_sales | unique_customers | sales_vs_regional_avg_pct | performance_category |
|---|---|---|---|---|---|
| East | Sarah Chen | 245000 | 28 | 34.2 | Top Performer |
| East | Mike Rodriguez | 189000 | 22 | 3.4 | Average Performer |
| West | Jennifer Park | 198000 | 31 | 12.8 | Average Performer |
Each CTE serves a specific purpose and builds toward the final analysis, making the complex calculation easy to understand and debug.
One of the most powerful uses of CTEs is eliminating deeply nested subqueries. Let's work through a real example: finding customers whose purchase patterns have changed significantly between quarters.
Here's the subquery nightmare version:
-- Don't write SQL like this
SELECT
customer_id,
customer_name,
q1_revenue,
q2_revenue,
revenue_change_pct
FROM (
SELECT
c.customer_id,
c.customer_name,
q1_stats.revenue as q1_revenue,
q2_stats.revenue as q2_revenue,
ROUND(
((q2_stats.revenue - q1_stats.revenue) / q1_stats.revenue * 100),
1
) as revenue_change_pct
FROM customers c
INNER JOIN (
SELECT
customer_id,
SUM(order_total) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-04-01'
GROUP BY customer_id
HAVING SUM(order_total) > 500
) q1_stats ON c.customer_id = q1_stats.customer_id
INNER JOIN (
SELECT
customer_id,
SUM(order_total) as revenue
FROM orders
WHERE order_date >= '2024-04-01'
AND order_date < '2024-07-01'
GROUP BY customer_id
HAVING SUM(order_total) > 500
) q2_stats ON c.customer_id = q2_stats.customer_id
) quarterly_comparison
WHERE ABS(revenue_change_pct) > 25
ORDER BY revenue_change_pct DESC;
Now the CTE version:
-- Much cleaner with CTEs
WITH q1_customer_revenue AS (
SELECT
customer_id,
SUM(order_total) as revenue
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2024-04-01'
GROUP BY customer_id
HAVING SUM(order_total) > 500
),
q2_customer_revenue AS (
SELECT
customer_id,
SUM(order_total) as revenue
FROM orders
WHERE order_date >= '2024-04-01'
AND order_date < '2024-07-01'
GROUP BY customer_id
HAVING SUM(order_total) > 500
),
quarterly_comparison AS (
SELECT
c.customer_id,
c.customer_name,
q1.revenue as q1_revenue,
q2.revenue as q2_revenue,
ROUND(
((q2.revenue - q1.revenue) / q1.revenue * 100),
1
) as revenue_change_pct
FROM customers c
INNER JOIN q1_customer_revenue q1 ON c.customer_id = q1.customer_id
INNER JOIN q2_customer_revenue q2 ON c.customer_id = q2.customer_id
)
SELECT
customer_id,
customer_name,
q1_revenue,
q2_revenue,
revenue_change_pct,
CASE
WHEN revenue_change_pct > 25 THEN 'Strong Growth'
WHEN revenue_change_pct < -25 THEN 'Significant Decline'
END as trend_category
FROM quarterly_comparison
WHERE ABS(revenue_change_pct) > 25
ORDER BY revenue_change_pct DESC;
The CTE version makes it immediately clear what we're doing:
Pro tip: When refactoring complex queries, start by identifying repeated logic or conceptually distinct steps. Each of these becomes a candidate for its own CTE.
Recursive CTEs are perfect for traversing tree-like structures: organizational charts, category hierarchies, or bill-of-materials relationships. The syntax includes a base case and a recursive case that references the CTE itself.
Let's build an employee hierarchy query that shows the complete reporting structure:
-- Sample data structure
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
manager_id INT,
department VARCHAR(50),
salary DECIMAL(10,2)
);
-- Insert some hierarchical data
INSERT INTO employees VALUES
(1, 'Sarah Johnson', NULL, 'Executive', 180000), -- CEO
(2, 'Mike Chen', 1, 'Engineering', 140000), -- VP Engineering
(3, 'Lisa Rodriguez', 1, 'Sales', 135000), -- VP Sales
(4, 'Tom Wilson', 2, 'Engineering', 110000), -- Engineering Manager
(5, 'Amy Park', 2, 'Engineering', 105000), -- Engineering Manager
(6, 'Chris Davis', 3, 'Sales', 95000), -- Sales Manager
(7, 'Jennifer Liu', 4, 'Engineering', 85000), -- Senior Developer
(8, 'Mark Thompson', 4, 'Engineering', 80000), -- Developer
(9, 'Rachel Kim', 5, 'Engineering', 82000), -- Senior Developer
(10, 'David Brown', 6, 'Sales', 70000); -- Sales Rep
Now let's use a recursive CTE to show the complete hierarchy with levels:
WITH RECURSIVE employee_hierarchy AS (
-- Base case: Start with the CEO (no manager)
SELECT
employee_id,
employee_name,
manager_id,
department,
salary,
0 as level,
employee_name as hierarchy_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: Find direct reports of employees we've already processed
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
e.department,
e.salary,
eh.level + 1,
eh.hierarchy_path || ' > ' || e.employee_name
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
REPEAT(' ', level) || employee_name as indented_name,
department,
salary,
level,
hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;
This produces a clear organizational view:
| employee_id | indented_name | department | salary | level | hierarchy_path |
|---|---|---|---|---|---|
| 1 | Sarah Johnson | Executive | 180000 | 0 | Sarah Johnson |
| 2 | Mike Chen | Engineering | 140000 | 1 | Sarah Johnson > Mike Chen |
| 4 | Tom Wilson | Engineering | 110000 | 2 | Sarah Johnson > Mike Chen > Tom Wilson |
| 7 | Jennifer Liu | Engineering | 85000 | 3 | Sarah Johnson > Mike Chen > Tom Wilson > Jennifer Liu |
| 8 | Mark Thompson | Engineering | 80000 | 3 | Sarah Johnson > Mike Chen > Tom Wilson > Mark Thompson |
The recursive CTE works by:
Here's another practical example—finding all subcategories within a product taxonomy:
WITH RECURSIVE category_tree AS (
-- Base case: Start with a specific parent category
SELECT
category_id,
category_name,
parent_category_id,
0 as depth,
category_name as full_path
FROM product_categories
WHERE category_name = 'Electronics'
UNION ALL
-- Recursive case: Find all subcategories
SELECT
pc.category_id,
pc.category_name,
pc.parent_category_id,
ct.depth + 1,
ct.full_path || ' > ' || pc.category_name
FROM product_categories pc
INNER JOIN category_tree ct ON pc.parent_category_id = ct.category_id
),
category_product_counts AS (
SELECT
ct.category_id,
ct.category_name,
ct.depth,
ct.full_path,
COUNT(p.product_id) as product_count,
AVG(p.price) as avg_price
FROM category_tree ct
LEFT JOIN products p ON ct.category_id = p.category_id
GROUP BY ct.category_id, ct.category_name, ct.depth, ct.full_path
)
SELECT
REPEAT(' ', depth) || category_name as indented_category,
product_count,
ROUND(avg_price, 2) as avg_price,
full_path
FROM category_product_counts
ORDER BY full_path;
Important: Most databases have recursion limits (typically 100-1000 levels) to prevent infinite loops. PostgreSQL defaults to 100, which you can adjust with
SET max_recursive_depth = 500;if needed.
CTEs aren't just about readability—they can significantly impact query performance, both positively and negatively. Understanding when and how to optimize them is crucial for production systems.
Different databases handle CTE execution differently:
PostgreSQL 12+: CTEs are typically inlined (treated like subqueries) unless they're recursive, contain data-modifying statements, or are referenced multiple times. This usually improves performance.
SQL Server: CTEs are always materialized, which can be slower for simple cases but faster when the CTE result is used multiple times.
MySQL 8.0+: Similar to PostgreSQL—inlines when possible.
Let's look at a performance comparison:
-- This CTE will likely be inlined (good performance)
WITH recent_orders AS (
SELECT customer_id, order_total, order_date
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
)
SELECT customer_id, COUNT(*) as order_count
FROM recent_orders
GROUP BY customer_id;
-- This CTE will be materialized because it's used twice
WITH customer_totals AS (
SELECT
customer_id,
SUM(order_total) as total_spent
FROM orders
GROUP BY customer_id
)
SELECT
ct1.customer_id,
ct1.total_spent,
ct2.total_spent as comparison_value
FROM customer_totals ct1
CROSS JOIN (SELECT AVG(total_spent) as total_spent FROM customer_totals) ct2;
CTEs benefit from the same indexing strategies as regular queries. The key is understanding what the CTE is actually doing:
-- This CTE will benefit from an index on (order_date, customer_id)
WITH recent_high_value_orders AS (
SELECT
customer_id,
order_id,
order_total,
order_date
FROM orders
WHERE order_date >= '2024-01-01'
AND order_total > 1000
)
-- Query execution continues...
-- Create the supporting index
CREATE INDEX idx_orders_date_customer_total
ON orders (order_date, customer_id)
INCLUDE (order_total, order_id);
In PostgreSQL, you can force materialization when it's beneficial:
-- Force materialization for expensive calculations used multiple times
WITH MATERIALIZED monthly_aggregates AS (
SELECT
DATE_TRUNC('month', order_date) as month,
customer_id,
COUNT(*) as order_count,
SUM(order_total) as revenue,
AVG(order_total) as avg_order_value,
STDDEV(order_total) as revenue_volatility
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY DATE_TRUNC('month', order_date), customer_id
)
-- Now use this expensive CTE multiple times without recalculation
SELECT ...;
-- DON'T: Chain CTEs unnecessarily when a single query would work
WITH step1 AS (SELECT customer_id FROM orders WHERE order_date > '2024-01-01'),
step2 AS (SELECT customer_id FROM step1),
step3 AS (SELECT customer_id FROM step2)
SELECT * FROM step3;
-- DO: Keep it simple when possible
SELECT DISTINCT customer_id
FROM orders
WHERE order_date > '2024-01-01';
-- DON'T: Use CTEs for simple filtering
WITH filtered_orders AS (
SELECT * FROM orders WHERE order_total > 100
)
SELECT customer_id FROM filtered_orders;
-- DO: Filter directly in the main query
SELECT customer_id
FROM orders
WHERE order_total > 100;
Let's build a comprehensive example that demonstrates CTEs in a realistic data transformation pipeline. We're creating a customer segmentation report that requires multiple complex calculations:
WITH order_metrics AS (
-- Step 1: Calculate basic order statistics per customer
SELECT
customer_id,
COUNT(DISTINCT order_id) as total_orders,
SUM(order_total) as total_revenue,
AVG(order_total) as avg_order_value,
MIN(order_date) as first_order_date,
MAX(order_date) as last_order_date,
STDDEV(order_total) as order_value_volatility
FROM orders
WHERE order_date >= '2023-01-01'
GROUP BY customer_id
),
customer_lifecycle AS (
-- Step 2: Add lifecycle metrics
SELECT
om.*,
CURRENT_DATE - om.last_order_date as days_since_last_order,
om.last_order_date - om.first_order_date as customer_lifespan_days,
CASE
WHEN CURRENT_DATE - om.last_order_date <= 30 THEN 'Active'
WHEN CURRENT_DATE - om.last_order_date <= 90 THEN 'At Risk'
ELSE 'Churned'
END as lifecycle_status
FROM order_metrics om
),
purchase_frequency AS (
-- Step 3: Calculate purchase frequency metrics
SELECT
cl.*,
CASE
WHEN cl.customer_lifespan_days > 0 THEN
cl.total_orders::DECIMAL / (cl.customer_lifespan_days / 30.0)
ELSE 0
END as orders_per_month,
CASE
WHEN cl.total_orders > 1 THEN
cl.customer_lifespan_days::DECIMAL / (cl.total_orders - 1)
ELSE NULL
END as avg_days_between_orders
FROM customer_lifecycle cl
),
customer_segments AS (
-- Step 4: Assign customer segments using RFM-like analysis
SELECT
pf.*,
c.customer_name,
c.customer_email,
-- Recency score (1-5, 5 is most recent)
CASE
WHEN pf.days_since_last_order <= 15 THEN 5
WHEN pf.days_since_last_order <= 30 THEN 4
WHEN pf.days_since_last_order <= 60 THEN 3
WHEN pf.days_since_last_order <= 90 THEN 2
ELSE 1
END as recency_score,
-- Frequency score (1-5, 5 is highest frequency)
CASE
WHEN pf.orders_per_month >= 4 THEN 5
WHEN pf.orders_per_month >= 2 THEN 4
WHEN pf.orders_per_month >= 1 THEN 3
WHEN pf.orders_per_month >= 0.5 THEN 2
ELSE 1
END as frequency_score,
-- Monetary score (1-5, 5 is highest value)
NTILE(5) OVER (ORDER BY pf.total_revenue) as monetary_score
FROM purchase_frequency pf
INNER JOIN customers c ON pf.customer_id = c.customer_id
),
final_segmentation AS (
-- Step 5: Create final segment labels
SELECT
*,
CASE
WHEN recency_score >= 4 AND frequency_score >= 4 AND monetary_score >= 4 THEN 'Champions'
WHEN recency_score >= 3 AND frequency_score >= 3 AND monetary_score >= 3 THEN 'Loyal Customers'
WHEN recency_score >= 4 AND frequency_score <= 2 THEN 'New Customers'
WHEN recency_score <= 2 AND frequency_score >= 3 AND monetary_score >= 3 THEN 'At Risk'
WHEN recency_score <= 2 AND frequency_score <= 2 THEN 'Churned'
WHEN monetary_score >= 4 THEN 'Big Spenders'
ELSE 'Casual Customers'
END as customer_segment,
recency_score + frequency_score + monetary_score as rfm_total_score
FROM customer_segments
)
-- Final output with segment summary
SELECT
customer_segment,
COUNT(*) as customer_count,
ROUND(AVG(total_revenue), 2) as avg_revenue_per_customer,
ROUND(AVG(total_orders), 1) as avg_orders_per_customer,
ROUND(AVG(avg_order_value), 2) as avg_order_value,
ROUND(AVG(rfm_total_score), 1) as avg_rfm_score
FROM final_segmentation
GROUP BY customer_segment
ORDER BY avg_revenue_per_customer DESC;
This query produces a comprehensive customer segmentation report:
| customer_segment | customer_count | avg_revenue_per_customer | avg_orders_per_customer | avg_order_value | avg_rfm_score |
|---|---|---|---|---|---|
| Champions | 127 | 3247.85 | 8.3 | 391.29 | 13.2 |
| Big Spenders | 89 | 2156.42 | 4.1 | 525.83 | 11.8 |
| Loyal Customers | 234 | 1543.67 | 6.2 | 248.98 | 10.5 |
| At Risk | 156 | 987.34 | 5.7 | 173.22 | 8.3 |
The beauty of this CTE chain is that each step is:
Now let's put your CTE skills to work with a realistic business scenario. You're analyzing the performance of an e-commerce platform's product recommendation engine.
Scenario: Your company tracks when customers view products and when they actually purchase them. You need to create a report showing:
Sample Data Setup:
-- You can create these tables to follow along
CREATE TABLE product_views (
view_id SERIAL PRIMARY KEY,
customer_id INT,
product_id INT,
view_date TIMESTAMP,
came_from_recommendation BOOLEAN
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(200),
category VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE purchases (
purchase_id SERIAL PRIMARY KEY,
customer_id INT,
product_id INT,
purchase_date TIMESTAMP,
quantity INT,
total_amount DECIMAL(10,2)
);
Your Task: Write a CTE-based query that produces a report with these columns:
Requirements:
Solution:
WITH recent_views AS (
-- Step 1: Get all views from the last 60 days
SELECT
pv.customer_id,
pv.product_id,
pv.view_date,
pv.came_from_recommendation,
p.product_name,
p.category,
p.price
FROM product_views pv
INNER JOIN products p ON pv.product_id = p.product_id
WHERE pv.view_date >= CURRENT_DATE - INTERVAL '60 days'
),
relevant_purchases AS (
-- Step 2: Get purchases that happened within 7 days of a view
SELECT DISTINCT
rv.customer_id,
rv.product_id,
rv.view_date,
rv.came_from_recommendation,
pu.purchase_date,
pu.quantity,
pu.total_amount
FROM recent_views rv
INNER JOIN purchases pu ON rv.customer_id = pu.customer_id
AND rv.product_id = pu.product_id
WHERE pu.purchase_date BETWEEN rv.view_date
AND rv.view_date + INTERVAL '7 days'
),
view_metrics AS (
-- Step 3: Calculate view statistics by product
SELECT
product_id,
product_name,
category,
price,
COUNT(*) as total_views,
COUNT(CASE WHEN came_from_recommendation THEN 1 END) as recommended_views,
COUNT(CASE WHEN NOT came_from_recommendation THEN 1 END) as organic_views
FROM recent_views
GROUP BY product_id, product_name, category, price
HAVING COUNT(*) >= 10 -- Minimum 10 views requirement
),
purchase_metrics AS (
-- Step 4: Calculate purchase statistics by product
SELECT
product_id,
COUNT(DISTINCT customer_id || '|' || view_date) as total_purchases,
COUNT(CASE WHEN came_from_recommendation
THEN customer_id || '|' || view_date END) as recommended_purchases,
COUNT(CASE WHEN NOT came_from_recommendation
THEN customer_id || '|' || view_date END) as organic_purchases,
SUM(total_amount) as total_revenue_generated
FROM relevant_purchases
GROUP BY product_id
),
conversion_analysis AS (
-- Step 5: Combine metrics and calculate conversion rates
SELECT
vm.product_id,
vm.product_name,
vm.category,
vm.total_views,
COALESCE(pm.total_purchases, 0) as total_purchases,
vm.recommended_views,
vm.organic_views,
COALESCE(pm.recommended_purchases, 0) as recommended_purchases,
COALESCE(pm.organic_purchases, 0) as organic_purchases,
COALESCE(pm.total_revenue_generated, 0) as total_revenue_generated,
-- Calculate conversion rates
ROUND(
COALESCE(pm.total_purchases, 0)::DECIMAL / vm.total_views * 100,
2
) as conversion_rate,
CASE
WHEN vm.recommended_views > 0 THEN
ROUND(
COALESCE(pm.recommended_purchases, 0)::DECIMAL / vm.recommended_views * 100,
2
)
ELSE 0
END as recommended_conversion_rate,
CASE
WHEN vm.organic_views > 0 THEN
ROUND(
COALESCE(pm.organic_purchases, 0)::DECIMAL / vm.organic_views * 100,
2
)
ELSE 0
END as organic_conversion_rate
FROM view_metrics vm
LEFT JOIN purchase_metrics pm ON vm.product_id = pm.product_id
)
SELECT
product_name,
category,
total_views,
total_purchases,
conversion_rate,
recommended_views,
organic_views,
recommended_conversion_rate,
organic_conversion_rate,
total_revenue_generated,
-- Add insights
CASE
WHEN recommended_conversion_rate > organic_conversion_rate + 5
THEN 'Recommendation Engine Helps'
WHEN organic_conversion_rate > recommended_conversion_rate + 5
THEN 'Organic Discovery Better'
ELSE 'Similar Performance'
END as recommendation_impact
FROM conversion_analysis
ORDER BY total_revenue_generated DESC
LIMIT 20;
This solution demonstrates several key CTE patterns:
recent_views applies the 60-day filterrelevant_purchases joins with time-based conditionsThe Mistake:
-- This will fail!
WITH second_cte AS (
SELECT * FROM first_cte -- Error: first_cte doesn't exist yet
),
first_cte AS (
SELECT customer_id FROM customers
)
SELECT * FROM second_cte;
Why It Fails: CTEs can only reference previously defined CTEs, not ones that come later in the WITH clause.
The Fix:
WITH first_cte AS (
SELECT customer_id FROM customers
),
second_cte AS (
SELECT * FROM first_cte -- Now this works
)
SELECT * FROM second_cte;
The Mistake:
WITH RECURSIVE hierarchy AS (
SELECT employee_id, manager_id FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.manager_id, eh.some_column -- Wrong number of columns!
FROM employees e
JOIN hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM hierarchy;
Why It Fails: The base case and recursive case must return the exact same number of columns with compatible types.
The Fix:
WITH RECURSIVE hierarchy AS (
SELECT employee_id, manager_id, 0 as level FROM employees WHERE manager_id IS NULL
UNION ALL
SELECT e.employee_id, e.manager_id, eh.level + 1
FROM employees e
JOIN hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM hierarchy;
The Mistake:
-- This CTE processes millions of rows but is only used for a small filter
WITH all_order_details AS (
SELECT
o.*,
c.customer_name,
p.product_name,
-- Lots of expensive calculations
COMPLEX_FUNCTION(o.order_total) as processed_value
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
)
SELECT customer_name
FROM all_order_details
WHERE customer_id = 12345; -- Only need one customer!
Why It's Slow: The CTE processes all data before the filter is applied.
The Fix:
-- Filter early, process less data
WITH target_customer_orders AS (
SELECT o.*
FROM orders o
WHERE o.customer_id = 12345 -- Filter first
),
enriched_orders AS (
SELECT
tco.*,
c.customer_name,
p.product_name,
COMPLEX_FUNCTION(tco.order_total) as processed_value
FROM target_customer_orders tco
JOIN customers c ON tco.customer_id = c.customer_id
JOIN products p ON tco.product_id = p.product_id
)
SELECT customer_name FROM enriched_orders;
The Mistake:
-- This creates a cycle: A reports to B, B reports to A
WITH RECURSIVE bad_hierarchy AS (
SELECT employee_id, manager_id, 1 as level
FROM employees WHERE employee_id = 1
UNION ALL
SELECT e.employee_id, e.manager_id, bh.level + 1
FROM employees e
JOIN bad_hierarchy bh ON e.manager_id = bh.employee_id
-- No termination condition for cycles!
)
SELECT * FROM bad_hierarchy;
Why It Fails: If your data has cycles, the recursion will continue indefinitely until it hits the database's recursion limit.
The Fix:
WITH RECURSIVE safe_hierarchy AS (
SELECT
employee_id,
manager_id,
1 as level,
ARRAY[employee_id] as path -- Track the path to detect cycles
FROM employees WHERE employee_id = 1
UNION ALL
SELECT
e.employee_id,
e.manager_id,
sh.level + 1,
sh.path || e.employee_id
FROM employees e
JOIN safe_hierarchy sh ON e.manager_id = sh.employee_id
WHERE e.employee_id != ALL(sh.path) -- Prevent cycles
AND sh.level < 10 -- Also add a reasonable depth limit
)
SELECT * FROM safe_hierarchy;
The Mistake:
WITH sales_data AS (
SELECT region, SUM(amount) as total FROM sales GROUP BY region
),
target_data AS (
SELECT region, target_amount as total FROM targets -- Same column name!
)
SELECT * FROM sales_data
JOIN target_data USING (region, total); -- Ambiguous!
Why It Fails: When joining CTEs with identical column names, the database can't determine which column you mean.
The Fix:
WITH sales_data AS (
SELECT region, SUM(amount) as sales_total FROM sales GROUP BY region
),
target_data AS (
SELECT region, target_amount as target_total FROM targets
)
SELECT
sd.region,
sd.sales_total,
td.target_total,
ROUND((sd.sales_total / td.target_total * 100), 1) as achievement_pct
FROM sales_data sd
JOIN target_data td ON sd.region = td.region;
Debugging Tip: When a CTE query isn't working, test each CTE individually by selecting from it directly:
WITH my_cte AS (...) SELECT * FROM my_cte LIMIT 10;This helps isolate where the problem occurs.
Common Table Expressions transform SQL from a write-once, debug-forever nightmare into readable, maintainable code. You've learned how to:
The key insight is that CTEs aren't just about syntax—they're about thinking differently about SQL structure. Instead of cramming complex logic into monolithic queries, you can break problems into logical steps that mirror how you'd explain the solution to a colleague.
Ready to take your SQL skills further? Here are three natural next steps:
Window Functions with CTEs: Learn how to combine CTEs with advanced window functions like LEAD/LAG, FIRST_VALUE, and custom window frames. This combination is incredibly powerful for time-series analysis and complex ranking problems.
SQL Query Optimization: Dive deeper into execution plans, index strategies, and performance tuning. Understanding how the query planner works with CTEs will help you write faster, more efficient code.
Advanced Recursive Patterns: Explore sophisticated recursive CTE applications like graph traversal, finding shortest paths, and handling complex many-to-many hierarchies. These patterns show up frequently in real-world data modeling scenarios.
Learning Path: Advanced SQL Queries