You're analyzing customer data for an e-commerce company, and your manager asks a seemingly simple question: "Which customers spent more than our average order value last month?" Your first instinct might be to calculate the average separately, then filter customers—but what if you could answer this question with a single, elegant SQL query?
This is where subqueries and Common Table Expressions (CTEs) become invaluable. These advanced SQL techniques let you break down complex analytical problems into manageable, readable components. Instead of writing multiple separate queries or creating temporary tables, you can embed calculations directly within your main query or create named, reusable query components.
By the end of this lesson, you'll transform from writing basic SELECT statements to crafting sophisticated, multi-layered queries that solve real business problems efficiently.
What you'll learn:
You should be comfortable with basic SQL SELECT statements, including WHERE clauses, JOINs, and aggregate functions like COUNT, SUM, and AVG. If you can write queries that join multiple tables and use GROUP BY, you're ready for this lesson.
A subquery is simply a SELECT statement nested inside another SQL statement. Think of it as asking a question to answer another question. In our customer example, the inner question is "What's the average order value?" and the outer question is "Which customers exceed that average?"
Let's start with a concrete example using a simplified e-commerce database:
-- Find customers who placed orders above the average order value
SELECT customer_id, customer_name, order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > (
SELECT AVG(order_total)
FROM orders
);
The query inside the parentheses—SELECT AVG(order_total) FROM orders—is our subquery. It calculates the average order value across all orders. The main query then uses this result to filter customers whose orders exceed that average.
WHERE clause subqueries are probably the most intuitive. They help you filter rows based on calculated values or comparisons with other data.
-- Find products that have never been ordered
SELECT product_id, product_name, price
FROM products
WHERE product_id NOT IN (
SELECT DISTINCT product_id
FROM order_items
WHERE product_id IS NOT NULL
);
Notice the WHERE product_id IS NOT NULL in the subquery. This is crucial when using NOT IN with subqueries—if any value in the subquery result is NULL, the NOT IN comparison will return no rows at all.
Important: Always handle NULLs carefully in subqueries. Use
WHERE column IS NOT NULLin your subquery when using IN or NOT IN operators.
Let's look at a more sophisticated example:
-- Find customers whose total lifetime spending is in the top 10%
SELECT customer_id, customer_name, total_spent
FROM (
SELECT c.customer_id,
c.customer_name,
SUM(o.order_total) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
) customer_totals
WHERE total_spent > (
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY customer_total)
FROM (
SELECT SUM(order_total) as customer_total
FROM orders
GROUP BY customer_id
) totals
);
This query has multiple layers: it calculates each customer's total spending, then compares that to the 90th percentile of all customer spending totals.
When you use a subquery in the FROM clause, you're treating the result set as a temporary table. This is incredibly powerful for multi-step analysis.
-- Calculate month-over-month growth in sales
SELECT
current_month.order_month,
current_month.monthly_sales,
previous_month.monthly_sales as previous_month_sales,
ROUND(
(current_month.monthly_sales - previous_month.monthly_sales) * 100.0 /
previous_month.monthly_sales, 2
) as growth_percentage
FROM (
SELECT
DATE_TRUNC('month', order_date) as order_month,
SUM(order_total) as monthly_sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
) current_month
LEFT JOIN (
SELECT
DATE_TRUNC('month', order_date) as order_month,
SUM(order_total) as monthly_sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
) previous_month
ON current_month.order_month = previous_month.order_month + INTERVAL '1 month'
ORDER BY current_month.order_month;
Here we create two identical subqueries in the FROM clause, then join them with a one-month offset to calculate growth rates.
SELECT clause subqueries (also called scalar subqueries) must return exactly one value. They're useful for adding calculated fields based on other data.
-- Show each customer with their order count and the average order count
SELECT
customer_id,
customer_name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id) as customer_order_count,
(SELECT ROUND(AVG(order_count), 1)
FROM (
SELECT COUNT(*) as order_count
FROM orders
GROUP BY customer_id
) counts) as avg_orders_per_customer
FROM customers c;
The first subquery is correlated—it references the outer query's customer table with c.customer_id. The second is independent and returns the same value for every row.
While subqueries are powerful, they can become hard to read when nested deeply. Common Table Expressions (CTEs) solve this by letting you name and reuse query components, making complex logic much more readable.
A CTE is defined using the WITH clause and creates a temporary named result set that exists only for the duration of your query.
WITH monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) as order_month,
SUM(order_total) as total_sales,
COUNT(*) as order_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
order_month,
total_sales,
order_count,
total_sales / order_count as avg_order_value
FROM monthly_sales
ORDER BY order_month;
This is much cleaner than putting the aggregation logic directly in the FROM clause. The CTE monthly_sales clearly describes what the temporary result set contains.
You can define multiple CTEs by separating them with commas:
WITH customer_stats AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(order_total) as total_spent,
AVG(order_total) as avg_order_value
FROM orders
GROUP BY customer_id
),
customer_segments AS (
SELECT
customer_id,
CASE
WHEN total_spent >= 1000 THEN 'High Value'
WHEN total_spent >= 500 THEN 'Medium Value'
ELSE 'Low Value'
END as segment
FROM customer_stats
)
SELECT
c.customer_name,
cs.order_count,
cs.total_spent,
seg.segment
FROM customers c
JOIN customer_stats cs ON c.customer_id = cs.customer_id
JOIN customer_segments seg ON c.customer_id = seg.customer_id
WHERE seg.segment = 'High Value'
ORDER BY cs.total_spent DESC;
Each CTE builds on the work of the previous ones, creating a clear progression from raw data to final analysis.
CTEs can reference themselves, which is useful for hierarchical data or iterative calculations. Here's an example with an organizational hierarchy:
WITH RECURSIVE employee_hierarchy AS (
-- Base case: top-level managers (no manager)
SELECT
employee_id,
employee_name,
manager_id,
1 as level,
employee_name as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees with managers
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
eh.level + 1,
eh.path || ' -> ' || e.employee_name
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT
employee_id,
employee_name,
level,
path as reporting_chain
FROM employee_hierarchy
ORDER BY level, employee_name;
The RECURSIVE keyword tells the database to keep applying the UNION ALL until no new rows are found.
Both subqueries and CTEs can solve similar problems, but each has advantages in different situations.
Use subqueries when:
Use CTEs when:
Consider this comparison. Here's a complex subquery approach:
-- Subquery approach: harder to follow
SELECT *
FROM (
SELECT
customer_id,
customer_name,
(SELECT AVG(order_total) FROM orders o2 WHERE o2.customer_id = c.customer_id) as avg_order
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
)
) customer_avgs
WHERE avg_order > (
SELECT AVG(order_total) FROM orders
);
And here's the same logic with CTEs:
-- CTE approach: much clearer
WITH customers_with_orders AS (
SELECT DISTINCT customer_id, customer_name
FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
),
customer_averages AS (
SELECT
cwo.customer_id,
cwo.customer_name,
AVG(o.order_total) as avg_order
FROM customers_with_orders cwo
JOIN orders o ON cwo.customer_id = o.customer_id
GROUP BY cwo.customer_id, cwo.customer_name
),
overall_average AS (
SELECT AVG(order_total) as overall_avg
FROM orders
)
SELECT
ca.customer_id,
ca.customer_name,
ca.avg_order
FROM customer_averages ca
CROSS JOIN overall_average oa
WHERE ca.avg_order > oa.overall_avg;
The CTE version is longer, but each step is clear and testable independently.
Let's practice with a realistic scenario. You're analyzing data for a subscription service with these tables:
subscribers: subscriber_id, subscriber_name, signup_date, plan_typemonthly_charges: charge_id, subscriber_id, charge_month, amountsupport_tickets: ticket_id, subscriber_id, created_date, status, priorityChallenge 1: Using Subqueries
Write a query to find subscribers who have been charged more than the average monthly charge in their most recent billing period.
Try it yourself first, then compare with this solution:
-- Find subscribers charged above average in their most recent bill
SELECT
s.subscriber_id,
s.subscriber_name,
recent_charges.amount as recent_charge_amount,
(SELECT ROUND(AVG(amount), 2) FROM monthly_charges) as overall_average
FROM subscribers s
JOIN (
SELECT
subscriber_id,
amount,
ROW_NUMBER() OVER (PARTITION BY subscriber_id ORDER BY charge_month DESC) as rn
FROM monthly_charges
) recent_charges ON s.subscriber_id = recent_charges.subscriber_id
WHERE recent_charges.rn = 1
AND recent_charges.amount > (SELECT AVG(amount) FROM monthly_charges);
Challenge 2: Converting to CTEs
Now rewrite the same logic using CTEs for better readability:
-- Same analysis using CTEs
WITH overall_stats AS (
SELECT AVG(amount) as avg_monthly_charge
FROM monthly_charges
),
recent_charges AS (
SELECT
subscriber_id,
amount,
ROW_NUMBER() OVER (PARTITION BY subscriber_id ORDER BY charge_month DESC) as rn
FROM monthly_charges
),
latest_charges AS (
SELECT subscriber_id, amount as recent_amount
FROM recent_charges
WHERE rn = 1
)
SELECT
s.subscriber_id,
s.subscriber_name,
lc.recent_amount,
os.avg_monthly_charge
FROM subscribers s
JOIN latest_charges lc ON s.subscriber_id = lc.subscriber_id
CROSS JOIN overall_stats os
WHERE lc.recent_amount > os.avg_monthly_charge;
Challenge 3: Complex Analysis
Create a query that shows each subscriber's support ticket summary alongside their billing information, but only for subscribers who have had high-priority tickets in the last 3 months:
WITH recent_high_priority_tickets AS (
SELECT DISTINCT subscriber_id
FROM support_tickets
WHERE priority = 'High'
AND created_date >= CURRENT_DATE - INTERVAL '3 months'
),
subscriber_ticket_stats AS (
SELECT
subscriber_id,
COUNT(*) as total_tickets,
COUNT(CASE WHEN status = 'Open' THEN 1 END) as open_tickets,
COUNT(CASE WHEN priority = 'High' THEN 1 END) as high_priority_tickets
FROM support_tickets
GROUP BY subscriber_id
),
subscriber_billing_stats AS (
SELECT
subscriber_id,
SUM(amount) as total_billed,
AVG(amount) as avg_monthly_charge,
COUNT(*) as months_billed
FROM monthly_charges
GROUP BY subscriber_id
)
SELECT
s.subscriber_name,
s.plan_type,
sts.total_tickets,
sts.open_tickets,
sts.high_priority_tickets,
sbs.total_billed,
sbs.avg_monthly_charge
FROM subscribers s
JOIN recent_high_priority_tickets rhpt ON s.subscriber_id = rhpt.subscriber_id
LEFT JOIN subscriber_ticket_stats sts ON s.subscriber_id = sts.subscriber_id
LEFT JOIN subscriber_billing_stats sbs ON s.subscriber_id = sbs.subscriber_id
ORDER BY sts.high_priority_tickets DESC, sbs.total_billed DESC;
-- This might return no rows unexpectedly
SELECT product_name
FROM products
WHERE product_id NOT IN (
SELECT product_id FROM discontinued_products
);
-- Fix: Handle NULLs explicitly
SELECT product_name
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM discontinued_products
WHERE product_id IS NOT NULL
);
-- Or use NOT EXISTS instead
SELECT product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1 FROM discontinued_products d
WHERE d.product_id = p.product_id
);
-- This will error if the subquery returns multiple rows
SELECT customer_name,
(SELECT order_total FROM orders WHERE customer_id = c.customer_id)
FROM customers c;
-- Fix: Use aggregate functions or LIMIT
SELECT customer_name,
(SELECT MAX(order_total) FROM orders WHERE customer_id = c.customer_id)
FROM customers c;
-- This runs the subquery for every row - can be slow
SELECT customer_name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) as order_count
FROM customers c;
-- Better: Use JOINs when possible
SELECT c.customer_name,
COALESCE(o.order_count, 0) as order_count
FROM customers c
LEFT JOIN (
SELECT customer_id, COUNT(*) as order_count
FROM orders
GROUP BY customer_id
) o ON c.customer_id = o.customer_id;
-- This will cause confusion
WITH orders AS (
SELECT * FROM monthly_sales -- Named 'orders' but contains sales data
),
orders AS ( -- Error: duplicate CTE name
SELECT * FROM actual_orders_table
)
SELECT * FROM orders; -- Which orders table?
Give your CTEs descriptive, unique names that clearly indicate their contents.
-- Slower
SELECT * FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);
-- Faster
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
-- Subquery approach
SELECT customer_name,
(SELECT SUM(order_total) FROM orders WHERE customer_id = c.customer_id)
FROM customers c;
-- JOIN approach (often faster)
SELECT c.customer_name, COALESCE(o.total_orders, 0)
FROM customers c
LEFT JOIN (
SELECT customer_id, SUM(order_total) as total_orders
FROM orders GROUP BY customer_id
) o ON c.customer_id = o.customer_id;
CTEs are generally optimized well by modern databases, but keep these points in mind:
You've now learned how to construct sophisticated SQL queries using both subqueries and CTEs. Subqueries excel at simple filtering and single-value calculations, while CTEs shine for complex, multi-step analysis that prioritizes readability.
Key takeaways:
Next steps in your SQL journey:
The combination of subqueries and CTEs gives you the tools to answer virtually any analytical question your data can support. Start applying these techniques to your own datasets, and you'll quickly discover how much more powerful and expressive your SQL queries can become.
Learning Path: SQL Fundamentals