Building Complex Queries: Subqueries and Common Table Expressions (CTEs) in SQL

You're analyzing customer data for an e-commerce company, and your manager asks a seemingly simple question: "Which customers spent more than our average order value last month?" Your first instinct might be to calculate the average separately, then filter customers—but what if you could answer this question with a single, elegant SQL query?

This is where subqueries and Common Table Expressions (CTEs) become invaluable. These advanced SQL techniques let you break down complex analytical problems into manageable, readable components. Instead of writing multiple separate queries or creating temporary tables, you can embed calculations directly within your main query or create named, reusable query components.

By the end of this lesson, you'll transform from writing basic SELECT statements to crafting sophisticated, multi-layered queries that solve real business problems efficiently.

What you'll learn:

How to write and use subqueries in WHERE, FROM, and SELECT clauses
When subqueries are the right tool versus other SQL approaches
How to construct Common Table Expressions (CTEs) for complex multi-step logic
Best practices for writing readable, maintainable complex queries
How to troubleshoot and optimize nested query structures

Prerequisites

You should be comfortable with basic SQL SELECT statements, including WHERE clauses, JOINs, and aggregate functions like COUNT, SUM, and AVG. If you can write queries that join multiple tables and use GROUP BY, you're ready for this lesson.

Understanding Subqueries: Queries Within Queries

A subquery is simply a SELECT statement nested inside another SQL statement. Think of it as asking a question to answer another question. In our customer example, the inner question is "What's the average order value?" and the outer question is "Which customers exceed that average?"

Let's start with a concrete example using a simplified e-commerce database:

-- Find customers who placed orders above the average order value
SELECT customer_id, customer_name, order_total
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_total > (
    SELECT AVG(order_total) 
    FROM orders
);

The query inside the parentheses—SELECT AVG(order_total) FROM orders—is our subquery. It calculates the average order value across all orders. The main query then uses this result to filter customers whose orders exceed that average.

Subqueries in the WHERE Clause

WHERE clause subqueries are probably the most intuitive. They help you filter rows based on calculated values or comparisons with other data.

-- Find products that have never been ordered
SELECT product_id, product_name, price
FROM products
WHERE product_id NOT IN (
    SELECT DISTINCT product_id 
    FROM order_items
    WHERE product_id IS NOT NULL
);

Notice the WHERE product_id IS NOT NULL in the subquery. This is crucial when using NOT IN with subqueries—if any value in the subquery result is NULL, the NOT IN comparison will return no rows at all.

Important: Always handle NULLs carefully in subqueries. Use WHERE column IS NOT NULL in your subquery when using IN or NOT IN operators.

Let's look at a more sophisticated example:

-- Find customers whose total lifetime spending is in the top 10%
SELECT customer_id, customer_name, total_spent
FROM (
    SELECT c.customer_id, 
           c.customer_name,
           SUM(o.order_total) as total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.customer_name
) customer_totals
WHERE total_spent > (
    SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY customer_total)
    FROM (
        SELECT SUM(order_total) as customer_total
        FROM orders
        GROUP BY customer_id
    ) totals
);

This query has multiple layers: it calculates each customer's total spending, then compares that to the 90th percentile of all customer spending totals.

Subqueries in the FROM Clause

When you use a subquery in the FROM clause, you're treating the result set as a temporary table. This is incredibly powerful for multi-step analysis.

-- Calculate month-over-month growth in sales
SELECT 
    current_month.order_month,
    current_month.monthly_sales,
    previous_month.monthly_sales as previous_month_sales,
    ROUND(
        (current_month.monthly_sales - previous_month.monthly_sales) * 100.0 / 
        previous_month.monthly_sales, 2
    ) as growth_percentage
FROM (
    SELECT 
        DATE_TRUNC('month', order_date) as order_month,
        SUM(order_total) as monthly_sales
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
) current_month
LEFT JOIN (
    SELECT 
        DATE_TRUNC('month', order_date) as order_month,
        SUM(order_total) as monthly_sales
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
) previous_month 
ON current_month.order_month = previous_month.order_month + INTERVAL '1 month'
ORDER BY current_month.order_month;

Here we create two identical subqueries in the FROM clause, then join them with a one-month offset to calculate growth rates.

Subqueries in the SELECT Clause

SELECT clause subqueries (also called scalar subqueries) must return exactly one value. They're useful for adding calculated fields based on other data.

-- Show each customer with their order count and the average order count
SELECT 
    customer_id,
    customer_name,
    (SELECT COUNT(*) 
     FROM orders o 
     WHERE o.customer_id = c.customer_id) as customer_order_count,
    (SELECT ROUND(AVG(order_count), 1)
     FROM (
         SELECT COUNT(*) as order_count
         FROM orders
         GROUP BY customer_id
     ) counts) as avg_orders_per_customer
FROM customers c;

The first subquery is correlated—it references the outer query's customer table with c.customer_id. The second is independent and returns the same value for every row.

Introduction to Common Table Expressions (CTEs)

While subqueries are powerful, they can become hard to read when nested deeply. Common Table Expressions (CTEs) solve this by letting you name and reuse query components, making complex logic much more readable.

A CTE is defined using the WITH clause and creates a temporary named result set that exists only for the duration of your query.

Basic CTE Syntax

WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', order_date) as order_month,
        SUM(order_total) as total_sales,
        COUNT(*) as order_count
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT 
    order_month,
    total_sales,
    order_count,
    total_sales / order_count as avg_order_value
FROM monthly_sales
ORDER BY order_month;

This is much cleaner than putting the aggregation logic directly in the FROM clause. The CTE monthly_sales clearly describes what the temporary result set contains.

Multiple CTEs in a Single Query

You can define multiple CTEs by separating them with commas:

WITH customer_stats AS (
    SELECT 
        customer_id,
        COUNT(*) as order_count,
        SUM(order_total) as total_spent,
        AVG(order_total) as avg_order_value
    FROM orders
    GROUP BY customer_id
),
customer_segments AS (
    SELECT 
        customer_id,
        CASE 
            WHEN total_spent >= 1000 THEN 'High Value'
            WHEN total_spent >= 500 THEN 'Medium Value'
            ELSE 'Low Value'
        END as segment
    FROM customer_stats
)
SELECT 
    c.customer_name,
    cs.order_count,
    cs.total_spent,
    seg.segment
FROM customers c
JOIN customer_stats cs ON c.customer_id = cs.customer_id
JOIN customer_segments seg ON c.customer_id = seg.customer_id
WHERE seg.segment = 'High Value'
ORDER BY cs.total_spent DESC;

Each CTE builds on the work of the previous ones, creating a clear progression from raw data to final analysis.

Recursive CTEs

CTEs can reference themselves, which is useful for hierarchical data or iterative calculations. Here's an example with an organizational hierarchy:

WITH RECURSIVE employee_hierarchy AS (
    -- Base case: top-level managers (no manager)
    SELECT 
        employee_id,
        employee_name,
        manager_id,
        1 as level,
        employee_name as path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers
    SELECT 
        e.employee_id,
        e.employee_name,
        e.manager_id,
        eh.level + 1,
        eh.path || ' -> ' || e.employee_name
    FROM employees e
    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT 
    employee_id,
    employee_name,
    level,
    path as reporting_chain
FROM employee_hierarchy
ORDER BY level, employee_name;

The RECURSIVE keyword tells the database to keep applying the UNION ALL until no new rows are found.

Choosing Between Subqueries and CTEs

Both subqueries and CTEs can solve similar problems, but each has advantages in different situations.

Use subqueries when:

You need a single calculated value (like an average for comparison)
The logic is simple and won't be reused
You're filtering with EXISTS or IN operations

Use CTEs when:

Your logic has multiple steps that build on each other
You want to reuse a result set multiple times
Readability and maintainability are important
You're working with hierarchical or recursive data

Consider this comparison. Here's a complex subquery approach:

-- Subquery approach: harder to follow
SELECT *
FROM (
    SELECT 
        customer_id,
        customer_name,
        (SELECT AVG(order_total) FROM orders o2 WHERE o2.customer_id = c.customer_id) as avg_order
    FROM customers c
    WHERE EXISTS (
        SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
    )
) customer_avgs
WHERE avg_order > (
    SELECT AVG(order_total) FROM orders
);

And here's the same logic with CTEs:

-- CTE approach: much clearer
WITH customers_with_orders AS (
    SELECT DISTINCT customer_id, customer_name
    FROM customers c
    WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
),
customer_averages AS (
    SELECT 
        cwo.customer_id,
        cwo.customer_name,
        AVG(o.order_total) as avg_order
    FROM customers_with_orders cwo
    JOIN orders o ON cwo.customer_id = o.customer_id
    GROUP BY cwo.customer_id, cwo.customer_name
),
overall_average AS (
    SELECT AVG(order_total) as overall_avg
    FROM orders
)
SELECT 
    ca.customer_id,
    ca.customer_name,
    ca.avg_order
FROM customer_averages ca
CROSS JOIN overall_average oa
WHERE ca.avg_order > oa.overall_avg;

The CTE version is longer, but each step is clear and testable independently.

Hands-On Exercise

Let's practice with a realistic scenario. You're analyzing data for a subscription service with these tables:

subscribers: subscriber_id, subscriber_name, signup_date, plan_type
monthly_charges: charge_id, subscriber_id, charge_month, amount
support_tickets: ticket_id, subscriber_id, created_date, status, priority

Challenge 1: Using Subqueries

Write a query to find subscribers who have been charged more than the average monthly charge in their most recent billing period.

Try it yourself first, then compare with this solution:

-- Find subscribers charged above average in their most recent bill
SELECT 
    s.subscriber_id,
    s.subscriber_name,
    recent_charges.amount as recent_charge_amount,
    (SELECT ROUND(AVG(amount), 2) FROM monthly_charges) as overall_average
FROM subscribers s
JOIN (
    SELECT 
        subscriber_id,
        amount,
        ROW_NUMBER() OVER (PARTITION BY subscriber_id ORDER BY charge_month DESC) as rn
    FROM monthly_charges
) recent_charges ON s.subscriber_id = recent_charges.subscriber_id
WHERE recent_charges.rn = 1
  AND recent_charges.amount > (SELECT AVG(amount) FROM monthly_charges);

Challenge 2: Converting to CTEs

Now rewrite the same logic using CTEs for better readability:

-- Same analysis using CTEs
WITH overall_stats AS (
    SELECT AVG(amount) as avg_monthly_charge
    FROM monthly_charges
),
recent_charges AS (
    SELECT 
        subscriber_id,
        amount,
        ROW_NUMBER() OVER (PARTITION BY subscriber_id ORDER BY charge_month DESC) as rn
    FROM monthly_charges
),
latest_charges AS (
    SELECT subscriber_id, amount as recent_amount
    FROM recent_charges
    WHERE rn = 1
)
SELECT 
    s.subscriber_id,
    s.subscriber_name,
    lc.recent_amount,
    os.avg_monthly_charge
FROM subscribers s
JOIN latest_charges lc ON s.subscriber_id = lc.subscriber_id
CROSS JOIN overall_stats os
WHERE lc.recent_amount > os.avg_monthly_charge;

Challenge 3: Complex Analysis

Create a query that shows each subscriber's support ticket summary alongside their billing information, but only for subscribers who have had high-priority tickets in the last 3 months:

WITH recent_high_priority_tickets AS (
    SELECT DISTINCT subscriber_id
    FROM support_tickets
    WHERE priority = 'High'
      AND created_date >= CURRENT_DATE - INTERVAL '3 months'
),
subscriber_ticket_stats AS (
    SELECT 
        subscriber_id,
        COUNT(*) as total_tickets,
        COUNT(CASE WHEN status = 'Open' THEN 1 END) as open_tickets,
        COUNT(CASE WHEN priority = 'High' THEN 1 END) as high_priority_tickets
    FROM support_tickets
    GROUP BY subscriber_id
),
subscriber_billing_stats AS (
    SELECT 
        subscriber_id,
        SUM(amount) as total_billed,
        AVG(amount) as avg_monthly_charge,
        COUNT(*) as months_billed
    FROM monthly_charges
    GROUP BY subscriber_id
)
SELECT 
    s.subscriber_name,
    s.plan_type,
    sts.total_tickets,
    sts.open_tickets,
    sts.high_priority_tickets,
    sbs.total_billed,
    sbs.avg_monthly_charge
FROM subscribers s
JOIN recent_high_priority_tickets rhpt ON s.subscriber_id = rhpt.subscriber_id
LEFT JOIN subscriber_ticket_stats sts ON s.subscriber_id = sts.subscriber_id
LEFT JOIN subscriber_billing_stats sbs ON s.subscriber_id = sbs.subscriber_id
ORDER BY sts.high_priority_tickets DESC, sbs.total_billed DESC;

Common Mistakes & Troubleshooting

Mistake 1: NULL Handling in IN/NOT IN Subqueries

-- This might return no rows unexpectedly
SELECT product_name
FROM products
WHERE product_id NOT IN (
    SELECT product_id FROM discontinued_products
);

-- Fix: Handle NULLs explicitly
SELECT product_name
FROM products
WHERE product_id NOT IN (
    SELECT product_id 
    FROM discontinued_products 
    WHERE product_id IS NOT NULL
);

-- Or use NOT EXISTS instead
SELECT product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1 FROM discontinued_products d
    WHERE d.product_id = p.product_id
);

Mistake 2: Subqueries Returning Multiple Rows

-- This will error if the subquery returns multiple rows
SELECT customer_name,
       (SELECT order_total FROM orders WHERE customer_id = c.customer_id)
FROM customers c;

-- Fix: Use aggregate functions or LIMIT
SELECT customer_name,
       (SELECT MAX(order_total) FROM orders WHERE customer_id = c.customer_id)
FROM customers c;

Mistake 3: Performance Issues with Correlated Subqueries

-- This runs the subquery for every row - can be slow
SELECT customer_name,
       (SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id) as order_count
FROM customers c;

-- Better: Use JOINs when possible
SELECT c.customer_name,
       COALESCE(o.order_count, 0) as order_count
FROM customers c
LEFT JOIN (
    SELECT customer_id, COUNT(*) as order_count
    FROM orders
    GROUP BY customer_id
) o ON c.customer_id = o.customer_id;

Mistake 4: CTE Name Conflicts

-- This will cause confusion
WITH orders AS (
    SELECT * FROM monthly_sales  -- Named 'orders' but contains sales data
),
orders AS (  -- Error: duplicate CTE name
    SELECT * FROM actual_orders_table
)
SELECT * FROM orders;  -- Which orders table?

Give your CTEs descriptive, unique names that clearly indicate their contents.

Performance Considerations

Subquery Performance Tips

Use EXISTS instead of IN when checking for existence:

-- Slower
SELECT * FROM customers 
WHERE customer_id IN (SELECT customer_id FROM orders);

-- Faster
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

Consider JOINs vs. subqueries:

-- Subquery approach
SELECT customer_name,
       (SELECT SUM(order_total) FROM orders WHERE customer_id = c.customer_id)
FROM customers c;

-- JOIN approach (often faster)
SELECT c.customer_name, COALESCE(o.total_orders, 0)
FROM customers c
LEFT JOIN (
    SELECT customer_id, SUM(order_total) as total_orders
    FROM orders GROUP BY customer_id
) o ON c.customer_id = o.customer_id;

CTE Performance Tips

CTEs are generally optimized well by modern databases, but keep these points in mind:

CTEs are evaluated once per query (unlike views which might be evaluated multiple times)
Recursive CTEs need careful termination conditions to avoid infinite loops
Very large intermediate result sets might benefit from temporary tables instead

Summary & Next Steps

You've now learned how to construct sophisticated SQL queries using both subqueries and CTEs. Subqueries excel at simple filtering and single-value calculations, while CTEs shine for complex, multi-step analysis that prioritizes readability.

Key takeaways:

Use subqueries for simple filtering, EXISTS checks, and scalar values
Use CTEs for complex logic that builds in multiple steps
Always handle NULLs carefully in subquery comparisons
Consider performance implications, especially with correlated subqueries
Choose descriptive names for CTEs to make your queries self-documenting

Next steps in your SQL journey:

Learn window functions to perform advanced analytics within result sets
Explore query optimization and execution plans
Study advanced JOIN techniques for complex data relationships
Practice with recursive queries for hierarchical data structures

The combination of subqueries and CTEs gives you the tools to answer virtually any analytical question your data can support. Start applying these techniques to your own datasets, and you'll quickly discover how much more powerful and expressive your SQL queries can become.

Master Subqueries and CTEs: Build Complex SQL Queries That Scale