Wicked Smart Data
LearnArticlesAbout
Sign InSign Up
LearnArticlesAboutContact
Sign InSign Up
Wicked Smart Data

The go-to platform for professionals who want to master data, automation, and AI — from Excel fundamentals to cutting-edge machine learning.

Platform

  • Learning Paths
  • Articles
  • About
  • Contact

Connect

  • Contact Us
  • RSS Feed

© 2026 Wicked Smart Data. All rights reserved.

Privacy PolicyTerms of Service
All Articles
SQL Interview Questions: From Basics to Advanced with Solutions

SQL Interview Questions: From Basics to Advanced with Solutions

Career Development🔥 Expert20 min readApr 1, 2026Updated Apr 1, 2026
Table of Contents
  • Prerequisites
  • Foundation Level Questions: Building Confidence
  • Understanding JOINs Beyond the Basics
  • Aggregation Pitfalls
  • Intermediate Level: Pattern Recognition and Problem-Solving
  • Ranking and Top-N Queries
  • Date and Time Complexity
  • Self-Joins and Hierarchical Data
  • Advanced Level: Performance and Architecture
  • Query Optimization and Execution Plans
  • Window Functions for Complex Analytics
  • Complex Data Modeling Scenarios

You're sitting across from a hiring manager who just asked, "Can you explain the difference between a clustered and non-clustered index?" Your heart races as you realize this isn't just about writing SELECT statements anymore. The SQL interview landscape has evolved dramatically, and today's data professionals need to demonstrate not just query-writing ability, but deep understanding of database internals, performance optimization, and complex problem-solving under pressure.

Modern SQL interviews test everything from basic joins to window functions, from query optimization to database design decisions that impact millions of users. Whether you're interviewing for a data analyst role at a startup or a senior database engineer position at a tech giant, you'll face questions that probe both your technical depth and your ability to think through real-world scenarios systematically.

This comprehensive guide will take you through the entire spectrum of SQL interview questions, from foundational concepts that trip up even experienced professionals to advanced scenarios that separate the experts from the rest. More importantly, we'll explore the why behind each solution, helping you develop the analytical thinking that interviewers really want to see.

What you'll learn:

  • Master the most common SQL interview patterns and how to approach them systematically
  • Understand advanced concepts like window functions, CTEs, and query optimization that appear in senior-level interviews
  • Navigate performance-focused questions about indexing, execution plans, and database design
  • Handle complex business logic scenarios using SQL, including data modeling challenges
  • Develop problem-solving frameworks that work under interview pressure

Prerequisites

To get the most from this lesson, you should have:

  • Solid understanding of basic SQL operations (SELECT, JOIN, WHERE, GROUP BY)
  • Experience with at least one major database system (PostgreSQL, MySQL, SQL Server, or Oracle)
  • Basic familiarity with database concepts like primary keys, foreign keys, and normalization
  • Some exposure to real-world data analysis or database development

Foundation Level Questions: Building Confidence

Let's start with questions that seem basic but often reveal gaps in understanding. These form the foundation that more complex questions build upon.

Understanding JOINs Beyond the Basics

Question: "Explain the difference between INNER JOIN, LEFT JOIN, and FULL OUTER JOIN. When would you use each?"

This seems straightforward, but the follow-up usually is: "Show me with a practical example and explain what happens with NULL values."

-- Sample data setup
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT
);

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100)
);

INSERT INTO employees VALUES 
(1, 'Alice Johnson', 10),
(2, 'Bob Smith', 20),
(3, 'Carol Williams', NULL),  -- No department assigned
(4, 'David Brown', 30);

INSERT INTO departments VALUES 
(10, 'Engineering'),
(20, 'Marketing'),
(40, 'HR');  -- No employees in this department

-- INNER JOIN: Only matching records
SELECT e.name, d.department_name
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id;

-- Result: Alice (Engineering), Bob (Marketing)
-- Carol excluded (NULL department), HR excluded (no employees)

The key insight interviewers want is that you understand data preservation. LEFT JOIN preserves all records from the left table:

-- LEFT JOIN: All employees, matching departments where available
SELECT e.name, COALESCE(d.department_name, 'Unassigned') as dept
FROM employees e
LEFT JOIN departments d ON e.department_id = d.department_id;

-- Result includes Carol with 'Unassigned' department

Advanced follow-up: "What's the performance difference between these JOINs?"

The answer reveals your understanding of query execution. INNER JOINs are typically fastest because the optimizer can stop processing once it finds matches. LEFT JOINs must process all left-side records regardless of matches, potentially making them slower on large datasets.

Aggregation Pitfalls

Question: "Find the average salary by department, but only for departments with more than 2 employees."

-- Common mistake - using WHERE instead of HAVING
SELECT department_id, AVG(salary)
FROM employees
WHERE COUNT(*) > 2  -- ERROR: Can't use aggregate in WHERE
GROUP BY department_id;

-- Correct approach
SELECT department_id, AVG(salary), COUNT(*) as employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 2;

The deeper question here tests your understanding of SQL's logical processing order:

  1. FROM/JOIN
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

Follow-up complexity: "What if we want departments with average salary above $75,000 AND more than 2 employees?"

SELECT 
    department_id, 
    AVG(salary) as avg_salary,
    COUNT(*) as employee_count
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 2 AND AVG(salary) > 75000;

Intermediate Level: Pattern Recognition and Problem-Solving

Intermediate questions test your ability to break down business problems into SQL logic and handle more complex data relationships.

Ranking and Top-N Queries

Question: "Find the top 3 highest-paid employees in each department."

This is where window functions shine, but many candidates struggle with the syntax:

-- Using RANK() - handles ties by giving same rank
SELECT 
    employee_id,
    name,
    department_id,
    salary,
    RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as salary_rank
FROM employees
WHERE salary_rank <= 3;  -- ERROR: Can't use window function in WHERE

The correct approach requires understanding that window functions execute after WHERE but before ORDER BY:

-- Correct: Using subquery or CTE
WITH ranked_employees AS (
    SELECT 
        employee_id,
        name,
        department_id,
        salary,
        RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) as salary_rank
    FROM employees
)
SELECT * FROM ranked_employees 
WHERE salary_rank <= 3;

Critical distinction: When do you use RANK() vs ROW_NUMBER() vs DENSE_RANK()?

-- Sample data with salary ties
-- Alice: $90k, Bob: $90k, Carol: $85k, David: $80k

-- ROW_NUMBER(): Arbitrary ordering for ties (1,2,3,4)
-- RANK(): Gaps after ties (1,1,3,4)  
-- DENSE_RANK(): No gaps (1,1,2,3)

SELECT 
    name,
    salary,
    ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
    RANK() OVER (ORDER BY salary DESC) as rank_pos,
    DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank_pos
FROM employees;

For "top 3" scenarios, DENSE_RANK() often makes the most business sense because it doesn't skip ranks after ties.

Date and Time Complexity

Question: "Calculate the running total of sales by month for the past year."

Date functions separate junior from mid-level developers:

-- Setting up realistic sales data
CREATE TABLE sales (
    sale_date DATE,
    amount DECIMAL(10,2)
);

-- Running total with proper date handling
SELECT 
    DATE_TRUNC('month', sale_date) as sale_month,
    SUM(amount) as monthly_total,
    SUM(SUM(amount)) OVER (
        ORDER BY DATE_TRUNC('month', sale_date)
        ROWS UNBOUNDED PRECEDING
    ) as running_total
FROM sales
WHERE sale_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY DATE_TRUNC('month', sale_date)
ORDER BY sale_month;

Advanced follow-up: "What if we want running total by month, but reset each year?"

SELECT 
    EXTRACT(YEAR FROM sale_date) as sale_year,
    DATE_TRUNC('month', sale_date) as sale_month,
    SUM(amount) as monthly_total,
    SUM(SUM(amount)) OVER (
        PARTITION BY EXTRACT(YEAR FROM sale_date)
        ORDER BY DATE_TRUNC('month', sale_date)
        ROWS UNBOUNDED PRECEDING
    ) as yearly_running_total
FROM sales
GROUP BY EXTRACT(YEAR FROM sale_date), DATE_TRUNC('month', sale_date)
ORDER BY sale_year, sale_month;

Tip: Always clarify date boundary conditions in interviews. Does "past year" mean 365 days ago, or January 1st of the current year? These details matter in production systems.

Self-Joins and Hierarchical Data

Question: "Find all employees who earn more than their manager."

Self-joins test your ability to think about table relationships conceptually:

-- Employee table with manager_id reference
SELECT 
    e.name as employee_name,
    e.salary as employee_salary,
    m.name as manager_name,
    m.salary as manager_salary
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id
WHERE e.salary > m.salary;

Complex extension: "Find the organizational hierarchy showing each employee's level and path from CEO."

-- Recursive CTE for hierarchy traversal
WITH RECURSIVE org_hierarchy AS (
    -- Base case: CEO (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        0 as level,
        CAST(name AS VARCHAR(1000)) as hierarchy_path
    FROM employees 
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        oh.level + 1,
        oh.hierarchy_path || ' -> ' || e.name
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT * FROM org_hierarchy 
ORDER BY level, name;

This pattern appears frequently in interviews because it tests multiple concepts: recursion, string concatenation, and hierarchical thinking.

Advanced Level: Performance and Architecture

Advanced questions probe your understanding of database internals, query optimization, and real-world performance considerations.

Query Optimization and Execution Plans

Question: "This query is running slowly on a table with 10 million records. How would you optimize it?"

-- Problematic query
SELECT 
    customer_name,
    total_orders,
    avg_order_value
FROM (
    SELECT 
        c.customer_name,
        COUNT(*) as total_orders,
        AVG(o.order_total) as avg_order_value
    FROM customers c
    LEFT JOIN orders o ON c.customer_id = o.customer_id
    WHERE o.order_date >= '2023-01-01'
    GROUP BY c.customer_id, c.customer_name
) subquery
WHERE total_orders > 10;

The optimization process should be systematic:

  1. Analyze the execution plan first
  2. Check indexing strategy
  3. Consider query rewriting
  4. Evaluate data partitioning
-- Optimized version with proper indexing assumptions
-- Indexes needed: orders(order_date, customer_id), customers(customer_id)

-- More efficient: filter early, avoid unnecessary LEFT JOIN
SELECT 
    c.customer_name,
    COUNT(*) as total_orders,
    AVG(o.order_total) as avg_order_value
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2023-01-01'
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(*) > 10;

Key optimization principles to discuss:

  • Predicate pushdown: Move filtering as early as possible
  • Index utilization: Ensure WHERE and JOIN conditions can use indexes
  • Cardinality estimation: Understand which table should drive the join
  • Partitioning: For very large tables, partition by order_date

Window Functions for Complex Analytics

Question: "Calculate the percentage contribution of each product to monthly sales, and identify products that represent more than 15% of any month's sales."

This tests advanced window function usage and percentage calculations:

WITH monthly_product_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date) as sale_month,
        product_id,
        product_name,
        SUM(amount) as product_monthly_sales
    FROM sales s
    JOIN products p ON s.product_id = p.product_id
    GROUP BY DATE_TRUNC('month', sale_date), product_id, product_name
),
sales_with_percentages AS (
    SELECT 
        *,
        SUM(product_monthly_sales) OVER (PARTITION BY sale_month) as monthly_total,
        product_monthly_sales * 100.0 / 
            SUM(product_monthly_sales) OVER (PARTITION BY sale_month) as pct_of_month
    FROM monthly_product_sales
)
SELECT 
    sale_month,
    product_name,
    product_monthly_sales,
    ROUND(pct_of_month, 2) as percentage_of_monthly_sales
FROM sales_with_percentages
WHERE pct_of_month > 15
ORDER BY sale_month, pct_of_month DESC;

Advanced insight: The interviewer might ask about precision. Using * 100.0 ensures floating-point division rather than integer division that would truncate results.

Complex Data Modeling Scenarios

Question: "Design a schema and write queries for a subscription service that needs to track user subscriptions, plan changes, cancellations, and calculate monthly recurring revenue (MRR)."

This type of question tests both technical SQL skills and business understanding:

-- Schema design
CREATE TABLE subscription_plans (
    plan_id INT PRIMARY KEY,
    plan_name VARCHAR(100),
    monthly_price DECIMAL(10,2),
    annual_price DECIMAL(10,2)
);

CREATE TABLE user_subscriptions (
    subscription_id INT PRIMARY KEY,
    user_id INT,
    plan_id INT,
    start_date DATE,
    end_date DATE,  -- NULL for active subscriptions
    billing_cycle VARCHAR(20), -- 'monthly' or 'annual'
    status VARCHAR(20) -- 'active', 'cancelled', 'expired'
);

CREATE TABLE subscription_events (
    event_id INT PRIMARY KEY,
    subscription_id INT,
    event_type VARCHAR(50), -- 'created', 'upgraded', 'downgraded', 'cancelled'
    event_date DATE,
    old_plan_id INT,
    new_plan_id INT
);

-- MRR calculation with plan changes
WITH monthly_snapshots AS (
    SELECT 
        DATE_TRUNC('month', event_date) as snapshot_month,
        subscription_id,
        LAST_VALUE(new_plan_id) OVER (
            PARTITION BY subscription_id, DATE_TRUNC('month', event_date)
            ORDER BY event_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        ) as active_plan_id
    FROM subscription_events
    WHERE event_type IN ('created', 'upgraded', 'downgraded')
),
mrr_calculation AS (
    SELECT 
        ms.snapshot_month,
        COUNT(DISTINCT ms.subscription_id) as active_subscriptions,
        SUM(
            CASE 
                WHEN us.billing_cycle = 'monthly' THEN sp.monthly_price
                WHEN us.billing_cycle = 'annual' THEN sp.annual_price / 12
            END
        ) as monthly_recurring_revenue
    FROM monthly_snapshots ms
    JOIN user_subscriptions us ON ms.subscription_id = us.subscription_id
    JOIN subscription_plans sp ON ms.active_plan_id = sp.plan_id
    WHERE us.status = 'active'
    GROUP BY ms.snapshot_month
)
SELECT * FROM mrr_calculation 
ORDER BY snapshot_month;

Business logic considerations to discuss:

  • How to handle mid-month plan changes (proration)
  • Annual subscription MRR calculation
  • Churn rate calculation
  • Customer lifetime value implications

Expert Level: Database Internals and Edge Cases

Expert-level questions probe deep database knowledge and your ability to handle unusual scenarios that can break standard approaches.

Understanding Indexes and Query Plans

Question: "Explain why this query might not use the index you expect, and how you'd fix it."

-- Table with composite index on (category, created_date)
CREATE INDEX idx_product_category_date ON products (category, created_date);

-- Query that might not use the index optimally
SELECT * FROM products 
WHERE created_date > '2023-01-01' 
  AND category IN ('Electronics', 'Books', 'Clothing');

The issue is index column order. Most databases can't efficiently use a composite index when the first column isn't in the WHERE clause with an equality condition:

-- Better index design for this query pattern
CREATE INDEX idx_product_date_category ON products (created_date, category);

-- Or separate indexes if query patterns vary
CREATE INDEX idx_product_category ON products (category);
CREATE INDEX idx_product_created_date ON products (created_date);

Advanced discussion points:

  • Index intersection: How the optimizer combines multiple indexes
  • Covering indexes: Including additional columns to avoid table lookups
  • Partial indexes: Filtering at the index level for better selectivity

Handling NULL Values and Edge Cases

Question: "Calculate the median salary by department, handling departments with even and odd number of employees."

Standard SQL doesn't have a built-in MEDIAN function in all databases, so this tests your ability to implement statistical functions:

-- PostgreSQL approach using percentile functions
SELECT 
    department_id,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary) as median_salary
FROM employees
WHERE salary IS NOT NULL
GROUP BY department_id;

-- Universal SQL approach for databases without percentile functions
WITH salary_ranks AS (
    SELECT 
        department_id,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary) as row_num,
        COUNT(*) OVER (PARTITION BY department_id) as dept_count
    FROM employees
    WHERE salary IS NOT NULL
),
median_candidates AS (
    SELECT 
        department_id,
        salary,
        CASE 
            WHEN dept_count % 2 = 1 THEN (dept_count + 1) / 2
            ELSE dept_count / 2
        END as lower_median_pos,
        CASE 
            WHEN dept_count % 2 = 1 THEN (dept_count + 1) / 2
            ELSE (dept_count / 2) + 1
        END as upper_median_pos
    FROM salary_ranks
)
SELECT 
    department_id,
    AVG(salary) as median_salary
FROM median_candidates
WHERE row_num IN (lower_median_pos, upper_median_pos)
GROUP BY department_id;

Complex Time Series Analysis

Question: "Identify gaps in daily sales data and calculate the impact of missing days on monthly targets."

This tests your ability to handle incomplete time series data:

-- Generate complete date series and identify gaps
WITH date_series AS (
    SELECT generate_series(
        '2023-01-01'::date, 
        '2023-12-31'::date, 
        '1 day'::interval
    )::date as calendar_date
),
daily_sales AS (
    SELECT 
        sale_date,
        SUM(amount) as daily_total
    FROM sales
    WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
    GROUP BY sale_date
),
sales_with_gaps AS (
    SELECT 
        ds.calendar_date,
        COALESCE(s.daily_total, 0) as daily_sales,
        CASE WHEN s.sale_date IS NULL THEN 1 ELSE 0 END as is_missing_day
    FROM date_series ds
    LEFT JOIN daily_sales s ON ds.calendar_date = s.sale_date
),
monthly_analysis AS (
    SELECT 
        DATE_TRUNC('month', calendar_date) as month,
        SUM(daily_sales) as actual_monthly_sales,
        COUNT(*) as total_days,
        SUM(is_missing_day) as missing_days,
        -- Estimate missing sales based on average of non-missing days
        CASE 
            WHEN COUNT(*) - SUM(is_missing_day) > 0 
            THEN SUM(daily_sales) / (COUNT(*) - SUM(is_missing_day)) * SUM(is_missing_day)
            ELSE 0 
        END as estimated_missing_sales
    FROM sales_with_gaps
    GROUP BY DATE_TRUNC('month', calendar_date)
)
SELECT 
    month,
    actual_monthly_sales,
    missing_days,
    estimated_missing_sales,
    actual_monthly_sales + estimated_missing_sales as projected_complete_sales,
    ROUND(estimated_missing_sales * 100.0 / 
          (actual_monthly_sales + estimated_missing_sales), 2) as pct_impact
FROM monthly_analysis
ORDER BY month;

Transaction Isolation and Concurrency

Question: "Explain how different isolation levels would affect this concurrent scenario, and write SQL to handle it safely."

Scenario: Two users simultaneously trying to book the last seat on a flight.

-- User A's transaction
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

SELECT available_seats FROM flights WHERE flight_id = 123;
-- Returns 1 available seat

-- Simultaneously, User B runs the same query
-- Both see 1 available seat

UPDATE flights 
SET available_seats = available_seats - 1 
WHERE flight_id = 123 AND available_seats > 0;

-- One transaction succeeds, the other should fail
COMMIT;

Isolation level impacts:

  • READ UNCOMMITTED: Both transactions could succeed (phantom booking)
  • READ COMMITTED: Still possible race condition
  • REPEATABLE READ: Prevents some issues but not phantom reads
  • SERIALIZABLE: Prevents the race condition by detecting conflicts

Production-safe approach using optimistic locking:

-- Include version or timestamp for optimistic locking
SELECT flight_id, available_seats, last_updated 
FROM flights 
WHERE flight_id = 123;

-- Update only if data hasn't changed
UPDATE flights 
SET available_seats = available_seats - 1,
    last_updated = CURRENT_TIMESTAMP
WHERE flight_id = 123 
  AND available_seats > 0 
  AND last_updated = ? -- Value from SELECT
RETURNING available_seats;

-- Check affected rows - if 0, booking failed due to concurrency

Hands-On Exercise

Let's tackle a comprehensive scenario that combines multiple advanced concepts. You're analyzing an e-commerce platform's performance.

Scenario: Build a comprehensive sales analysis system that answers these business questions:

  1. What's the month-over-month growth rate for each product category?
  2. Which customers are at risk of churning (no purchases in 90 days, but were active before)?
  3. What's the customer acquisition cost (CAC) payback period by marketing channel?
  4. Identify seasonal patterns in product demand with statistical significance.

Here's the schema and sample solution approach:

-- Schema setup
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    email VARCHAR(255),
    registration_date DATE,
    acquisition_channel VARCHAR(100)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    status VARCHAR(50)
);

CREATE TABLE order_items (
    item_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    unit_price DECIMAL(10,2)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(255),
    category VARCHAR(100),
    cost DECIMAL(10,2)
);

CREATE TABLE marketing_spend (
    channel VARCHAR(100),
    spend_date DATE,
    amount DECIMAL(10,2)
);

-- Solution 1: Month-over-month growth by category
WITH monthly_category_sales AS (
    SELECT 
        DATE_TRUNC('month', o.order_date) as sale_month,
        p.category,
        SUM(oi.quantity * oi.unit_price) as monthly_revenue
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN products p ON oi.product_id = p.product_id
    WHERE o.status = 'completed'
    GROUP BY DATE_TRUNC('month', o.order_date), p.category
),
growth_calculation AS (
    SELECT 
        *,
        LAG(monthly_revenue) OVER (
            PARTITION BY category 
            ORDER BY sale_month
        ) as prev_month_revenue
    FROM monthly_category_sales
)
SELECT 
    sale_month,
    category,
    monthly_revenue,
    prev_month_revenue,
    CASE 
        WHEN prev_month_revenue IS NOT NULL AND prev_month_revenue > 0
        THEN ROUND(
            (monthly_revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 
            2
        )
        ELSE NULL 
    END as mom_growth_pct
FROM growth_calculation
ORDER BY category, sale_month;

-- Solution 2: Churn risk identification
WITH customer_purchase_patterns AS (
    SELECT 
        c.customer_id,
        c.email,
        MAX(o.order_date) as last_purchase_date,
        COUNT(o.order_id) as total_orders,
        AVG(DATE_PART('day', 
            o.order_date - LAG(o.order_date) OVER (
                PARTITION BY c.customer_id 
                ORDER BY o.order_date
            )
        )) as avg_days_between_orders
    FROM customers c
    LEFT JOIN orders o ON c.customer_id = o.customer_id 
        AND o.status = 'completed'
    GROUP BY c.customer_id, c.email
)
SELECT 
    customer_id,
    email,
    last_purchase_date,
    total_orders,
    avg_days_between_orders,
    CURRENT_DATE - last_purchase_date as days_since_last_purchase,
    CASE 
        WHEN total_orders >= 2 
             AND CURRENT_DATE - last_purchase_date > 90
             AND avg_days_between_orders < 60
        THEN 'High Risk'
        WHEN total_orders >= 1 
             AND CURRENT_DATE - last_purchase_date > 60
        THEN 'Medium Risk'
        ELSE 'Active'
    END as churn_risk_level
FROM customer_purchase_patterns
WHERE last_purchase_date IS NOT NULL
ORDER BY days_since_last_purchase DESC;

Try implementing the remaining solutions (CAC payback period and seasonal analysis) before looking at the complete answers. This exercise tests your ability to:

  • Combine multiple tables with complex joins
  • Use window functions for time-based analysis
  • Handle NULL values appropriately
  • Create meaningful business metrics from raw data

Common Mistakes & Troubleshooting

Window Function Pitfalls

Mistake 1: Using window functions in WHERE clauses

-- WRONG
SELECT customer_id, 
       ROW_NUMBER() OVER (ORDER BY total_spent DESC) as rank
FROM customers
WHERE rank <= 10;  -- Error: can't use window function result in WHERE

-- RIGHT
SELECT * FROM (
    SELECT customer_id, 
           ROW_NUMBER() OVER (ORDER BY total_spent DESC) as rank
    FROM customers
) ranked
WHERE rank <= 10;

Mistake 2: Incorrect PARTITION BY usage

-- WRONG - This partitions by ALL columns, creating individual partitions
SELECT 
    employee_id, department, salary,
    RANK() OVER (PARTITION BY employee_id, department, salary 
                 ORDER BY salary DESC) as rank
FROM employees;

-- RIGHT - Partition by the grouping column only
SELECT 
    employee_id, department, salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank
FROM employees;

JOIN Performance Issues

Problem: LEFT JOIN returning more rows than expected

This often happens with many-to-many relationships:

-- Problem: Employee appears multiple times if they have multiple skills
SELECT e.name, s.skill_name
FROM employees e
LEFT JOIN employee_skills es ON e.employee_id = es.employee_id
LEFT JOIN skills s ON es.skill_id = s.skill_id;

-- Solution: Use appropriate aggregation or restructure query
SELECT 
    e.name, 
    STRING_AGG(s.skill_name, ', ') as skills
FROM employees e
LEFT JOIN employee_skills es ON e.employee_id = es.employee_id
LEFT JOIN skills s ON es.skill_id = s.skill_id
GROUP BY e.employee_id, e.name;

Date/Time Calculation Errors

Problem: Time zone confusion in date calculations

-- Dangerous - assumes server timezone
SELECT COUNT(*) 
FROM orders 
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Better - explicit timezone handling
SELECT COUNT(*) 
FROM orders 
WHERE order_date >= (CURRENT_DATE AT TIME ZONE 'UTC') - INTERVAL '30 days';

Problem: Leap year and month boundary issues

-- Problematic for month boundaries
SELECT * FROM sales 
WHERE sale_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month');

-- More reliable
SELECT * FROM sales 
WHERE sale_date >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL '1 month'
  AND sale_date < DATE_TRUNC('month', CURRENT_DATE);

Performance Debugging Strategy

When a query is slow, follow this systematic approach:

  1. Check the execution plan - Look for table scans, nested loops on large datasets
  2. Verify index usage - Ensure WHERE and JOIN columns are indexed
  3. Analyze cardinality - Are you joining large tables without proper filtering?
  4. Consider data skew - Are some partitions much larger than others?
  5. Test with smaller datasets - Isolate whether it's a logic or scale problem
-- PostgreSQL example for plan analysis
EXPLAIN (ANALYZE, BUFFERS) 
SELECT customer_id, COUNT(*) 
FROM orders 
WHERE order_date >= '2023-01-01' 
GROUP BY customer_id;

Interview tip: Always verbalize your debugging thought process. Interviewers want to see systematic problem-solving, not just correct answers.

Summary & Next Steps

Mastering SQL interviews requires more than memorizing syntax—it demands deep understanding of database concepts, performance implications, and business logic translation. The questions we've covered represent the core patterns you'll encounter, from basic joins that test conceptual understanding to complex window functions that probe your analytical thinking.

The key themes that separate successful candidates are:

Technical depth: Understanding not just what SQL does, but how and why it works that way. This includes query execution order, index utilization, and performance trade-offs.

Problem decomposition: Breaking complex business requirements into logical SQL components, then combining them systematically.

Edge case awareness: Handling NULLs, empty result sets, and boundary conditions that can break naive solutions.

Performance mindset: Writing queries that work correctly is just the starting point—production SQL must also perform well at scale.

As you prepare for interviews, focus on these practical next steps:

  1. Practice explaining your thought process out loud. Walk through each step of your solution, including why you chose specific approaches over alternatives.

  2. Set up a practice database with realistic datasets (thousands or millions of rows) to test performance implications of your queries.

  3. Study execution plans for your database system. Understanding how the optimizer works gives you a significant advantage in optimization discussions.

  4. Research company-specific scenarios. E-commerce companies ask different questions than financial firms or SaaS platforms. Tailor your preparation to the business domain.

  5. Master your database system's specific features. While SQL is standardized, each system (PostgreSQL, SQL Server, Oracle) has unique capabilities that expert-level positions expect you to leverage.

Remember that SQL interviews are ultimately about demonstrating your ability to solve real business problems with data. The syntax is just the tool—your analytical thinking and systematic approach are what hiring managers really want to see.

Learning Path: Landing Your First Data Role

Previous

Master SQL Interview Questions: From Basic Queries to Advanced Analytics with Complete Solutions

Next

Networking for Data Professionals: LinkedIn, Communities, and Conferences

Related Articles

Career Development🌱 Foundation

Transitioning to Data from Another Career: A Complete Roadmap

18 min
Career Development🔥 Expert

From Any Field to Data: The Complete Career Transition Guide

38 min
Career Development⚡ Practitioner

How to Successfully Transition to Data from Another Career: A Strategic Guide

21 min

On this page

  • Prerequisites
  • Foundation Level Questions: Building Confidence
  • Understanding JOINs Beyond the Basics
  • Aggregation Pitfalls
  • Intermediate Level: Pattern Recognition and Problem-Solving
  • Ranking and Top-N Queries
  • Date and Time Complexity
  • Self-Joins and Hierarchical Data
  • Advanced Level: Performance and Architecture
  • Query Optimization and Execution Plans
Expert Level: Database Internals and Edge Cases
  • Understanding Indexes and Query Plans
  • Handling NULL Values and Edge Cases
  • Complex Time Series Analysis
  • Transaction Isolation and Concurrency
  • Hands-On Exercise
  • Common Mistakes & Troubleshooting
  • Window Function Pitfalls
  • JOIN Performance Issues
  • Date/Time Calculation Errors
  • Performance Debugging Strategy
  • Summary & Next Steps
  • Window Functions for Complex Analytics
  • Complex Data Modeling Scenarios
  • Expert Level: Database Internals and Edge Cases
  • Understanding Indexes and Query Plans
  • Handling NULL Values and Edge Cases
  • Complex Time Series Analysis
  • Transaction Isolation and Concurrency
  • Hands-On Exercise
  • Common Mistakes & Troubleshooting
  • Window Function Pitfalls
  • JOIN Performance Issues
  • Date/Time Calculation Errors
  • Performance Debugging Strategy
  • Summary & Next Steps