
You're sitting across from a hiring manager, and they've just asked: "Can you write a query to find our second-highest paid employee?" Your mind races. You know SQL, you use it daily, but something about the interview context makes even familiar concepts feel harder. The silence stretches as you wonder if there's a trick you're missing.
SQL interviews aren't just about knowing the syntax—they're about demonstrating problem-solving skills, understanding performance implications, and communicating your thought process clearly. Whether you're aiming for your first data analyst role or stepping up to senior data engineer, mastering SQL interview questions is crucial for landing the position you want.
This lesson will prepare you for the full spectrum of SQL questions you'll encounter, from basic syntax to complex analytical queries. We'll work through real interview scenarios, explore multiple solution approaches, and discuss what interviewers are really looking for in your answers.
What you'll learn:
You should be comfortable with basic SQL operations (SELECT, WHERE, GROUP BY, JOINs) and understand fundamental database concepts like primary keys and relationships. Familiarity with window functions and CTEs will help with advanced sections, but we'll explain these concepts as we encounter them.
Before diving into specific questions, let's establish the right approach. SQL interviews aren't just about getting the right answer—they're about demonstrating how you think through problems.
When faced with a SQL question, follow this framework:
Let's see this in action with actual interview questions.
These questions test fundamental SQL knowledge and are common in entry-level interviews. Even if you're experienced, you might encounter these as warm-up questions.
Scenario: You have an employees table with columns: employee_id, name, department, salary, hire_date. Find all employees earning more than the average salary.
The thought process: This requires a subquery to calculate the average, then filter against it.
-- Initial approach - straightforward but clear
SELECT employee_id, name, department, salary
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
);
Why this works: The subquery calculates the average salary across all employees, and the outer query filters for those above this average.
Alternative approach using window functions:
-- More advanced approach showing additional skills
SELECT employee_id, name, department, salary
FROM (
SELECT employee_id, name, department, salary,
AVG(salary) OVER () as avg_salary
FROM employees
) t
WHERE salary > avg_salary;
Interview tip: Start with the simpler subquery approach, then mention the window function alternative to show you know multiple methods.
Scenario: Count the number of employees in each department, including departments with zero employees.
This tests understanding of JOINs and handling NULL values.
-- Assuming we have a departments table
SELECT d.department_name,
COUNT(e.employee_id) as employee_count
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name, d.department_id
ORDER BY employee_count DESC;
Key points to mention:
Scenario: Find employees hired in the last 90 days, ordered by hire date.
SELECT name, department, hire_date
FROM employees
WHERE hire_date >= CURRENT_DATE - INTERVAL 90 DAY
ORDER BY hire_date DESC;
Discussion points:
WHERE hire_date >= DATE_SUB(CURRENT_DATE, INTERVAL 90 DAY)These questions require combining multiple SQL concepts and show practical problem-solving skills.
Scenario: Find the highest-paid employee in each department.
This is a classic interview question with several valid approaches.
Approach 1: Window Functions (Recommended)
SELECT department, name, salary
FROM (
SELECT department, name, salary,
ROW_NUMBER() OVER (
PARTITION BY department
ORDER BY salary DESC
) as rn
FROM employees
) ranked
WHERE rn = 1;
Approach 2: Correlated Subquery
SELECT e1.department, e1.name, e1.salary
FROM employees e1
WHERE e1.salary = (
SELECT MAX(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
);
Which to choose? Discuss both approaches:
Scenario: Calculate a running total of sales by date.
-- Sample sales table: sale_date, amount
SELECT sale_date,
amount,
SUM(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total
FROM sales
ORDER BY sale_date;
Advanced variation: Running total by category:
SELECT sale_date, category, amount,
SUM(amount) OVER (
PARTITION BY category
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) as running_total_by_category
FROM sales
ORDER BY category, sale_date;
Performance note: Window functions can be memory-intensive on large datasets. Mention this consideration in interviews.
Scenario: Find missing dates in a daily sales report.
This tests understanding of date generation and set operations.
-- Generate date series and find missing dates
WITH date_series AS (
SELECT DATE_ADD('2024-01-01', INTERVAL n DAY) as report_date
FROM (
SELECT a.n + b.n * 10 + c.n * 100 as n
FROM (SELECT 0 as n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
CROSS JOIN (SELECT 0 as n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
CROSS JOIN (SELECT 0 as n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c
) numbers
WHERE DATE_ADD('2024-01-01', INTERVAL n DAY) <= '2024-12-31'
)
SELECT ds.report_date
FROM date_series ds
LEFT JOIN sales s ON ds.report_date = s.sale_date
WHERE s.sale_date IS NULL
ORDER BY ds.report_date;
Alternative using recursive CTE (PostgreSQL/SQL Server):
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' as report_date
UNION ALL
SELECT report_date + INTERVAL '1 day'
FROM date_series
WHERE report_date < '2024-12-31'
)
SELECT ds.report_date
FROM date_series ds
LEFT JOIN sales s ON ds.report_date = s.sale_date
WHERE s.sale_date IS NULL;
These questions test deep SQL knowledge and complex analytical thinking.
Scenario: Calculate monthly customer retention rates. A customer is "retained" if they made a purchase in consecutive months.
WITH monthly_customers AS (
-- Get unique customers per month
SELECT
DATE_FORMAT(purchase_date, '%Y-%m') as purchase_month,
customer_id
FROM purchases
GROUP BY DATE_FORMAT(purchase_date, '%Y-%m'), customer_id
),
customer_months AS (
-- Add previous month for each customer
SELECT
purchase_month,
customer_id,
LAG(purchase_month) OVER (
PARTITION BY customer_id
ORDER BY purchase_month
) as previous_month
FROM monthly_customers
),
retention_data AS (
-- Identify retained customers
SELECT
purchase_month,
customer_id,
CASE
WHEN DATE_ADD(STR_TO_DATE(CONCAT(previous_month, '-01'), '%Y-%m-%d'), INTERVAL 1 MONTH) =
STR_TO_DATE(CONCAT(purchase_month, '-01'), '%Y-%m-%d')
THEN 1
ELSE 0
END as is_retained
FROM customer_months
WHERE previous_month IS NOT NULL
)
SELECT
purchase_month,
COUNT(*) as total_customers,
SUM(is_retained) as retained_customers,
ROUND(SUM(is_retained) * 100.0 / COUNT(*), 2) as retention_rate
FROM retention_data
GROUP BY purchase_month
ORDER BY purchase_month;
What this demonstrates:
Scenario: You have an employee table with manager_id references. Find all employees who report to a specific manager, directly or indirectly.
-- Recursive CTE to traverse hierarchy
WITH RECURSIVE employee_hierarchy AS (
-- Base case: direct reports
SELECT employee_id, name, manager_id, 1 as level
FROM employees
WHERE manager_id = 123 -- Specific manager ID
UNION ALL
-- Recursive case: reports of reports
SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT employee_id, name, level
FROM employee_hierarchy
ORDER BY level, name;
Alternative approach for systems without recursive CTEs:
-- Self-joins for limited depth (up to 3 levels shown)
SELECT DISTINCT
COALESCE(e1.employee_id, e2.employee_id, e3.employee_id) as employee_id,
COALESCE(e1.name, e2.name, e3.name) as name,
CASE
WHEN e1.employee_id IS NOT NULL THEN 1
WHEN e2.employee_id IS NOT NULL THEN 2
WHEN e3.employee_id IS NOT NULL THEN 3
END as level
FROM employees e1
LEFT JOIN employees e2 ON e1.employee_id = e2.manager_id
LEFT JOIN employees e3 ON e2.employee_id = e3.manager_id
WHERE e1.manager_id = 123
OR e2.manager_id = 123
OR e3.manager_id = 123;
Scenario: Calculate the percentage of total sales each product contributed each month, and identify products that increased their share month-over-month.
WITH monthly_sales AS (
SELECT
DATE_FORMAT(sale_date, '%Y-%m') as sale_month,
product_id,
SUM(amount) as product_sales
FROM sales
GROUP BY DATE_FORMAT(sale_date, '%Y-%m'), product_id
),
monthly_totals AS (
SELECT
sale_month,
SUM(product_sales) as total_sales
FROM monthly_sales
GROUP BY sale_month
),
sales_with_percentage AS (
SELECT
ms.sale_month,
ms.product_id,
ms.product_sales,
mt.total_sales,
ROUND(ms.product_sales * 100.0 / mt.total_sales, 2) as sales_percentage
FROM monthly_sales ms
INNER JOIN monthly_totals mt ON ms.sale_month = mt.sale_month
),
sales_with_previous AS (
SELECT
*,
LAG(sales_percentage) OVER (
PARTITION BY product_id
ORDER BY sale_month
) as previous_percentage
FROM sales_with_percentage
)
SELECT
sale_month,
product_id,
product_sales,
sales_percentage,
previous_percentage,
ROUND(sales_percentage - COALESCE(previous_percentage, 0), 2) as percentage_change,
CASE
WHEN sales_percentage > COALESCE(previous_percentage, 0) THEN 'Increased'
WHEN sales_percentage < COALESCE(previous_percentage, 0) THEN 'Decreased'
ELSE 'Unchanged'
END as trend
FROM sales_with_previous
WHERE previous_percentage IS NOT NULL
AND sales_percentage > previous_percentage
ORDER BY sale_month, percentage_change DESC;
Interviewers often ask about query performance, especially for senior roles.
Scenario: "How would you optimize this slow-running query?"
-- Original slow query
SELECT c.customer_name,
COUNT(o.order_id) as order_count,
AVG(o.total_amount) as avg_order_value
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2023-01-01'
GROUP BY c.customer_id, c.customer_name
HAVING COUNT(o.order_id) > 5;
Optimization strategies to discuss:
-- Suggested indexes
CREATE INDEX idx_customers_reg_date ON customers(registration_date);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_customer_total ON orders(customer_id, total_amount);
-- Pre-filter orders to reduce join size
WITH active_customers AS (
SELECT customer_id, customer_name
FROM customers
WHERE registration_date >= '2023-01-01'
),
customer_orders AS (
SELECT customer_id,
COUNT(*) as order_count,
AVG(total_amount) as avg_order_value
FROM orders o
WHERE EXISTS (SELECT 1 FROM active_customers ac WHERE ac.customer_id = o.customer_id)
GROUP BY customer_id
HAVING COUNT(*) > 5
)
SELECT ac.customer_name, co.order_count, co.avg_order_value
FROM customer_orders co
INNER JOIN active_customers ac ON co.customer_id = ac.customer_id;
Key optimization topics to discuss:
Let's work through a comprehensive scenario that combines multiple concepts. You're analyzing an e-commerce database with these tables:
-- Table structures
customers: customer_id, name, email, registration_date, city
products: product_id, name, category, price
orders: order_id, customer_id, order_date, status
order_items: order_item_id, order_id, product_id, quantity, unit_price
Challenge: Write a query to find the top 3 customers by total purchase amount in each city, but only include customers who have made purchases in at least 2 different product categories.
Solution approach:
WITH customer_purchases AS (
-- Calculate total purchases and category count per customer
SELECT
c.customer_id,
c.name,
c.city,
SUM(oi.quantity * oi.unit_price) as total_purchase_amount,
COUNT(DISTINCT p.category) as category_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name, c.city
HAVING COUNT(DISTINCT p.category) >= 2
),
ranked_customers AS (
-- Rank customers within each city
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY city
ORDER BY total_purchase_amount DESC
) as city_rank
FROM customer_purchases
)
SELECT
city,
name,
total_purchase_amount,
category_count,
city_rank
FROM ranked_customers
WHERE city_rank <= 3
ORDER BY city, city_rank;
Work through this step-by-step:
-- Problematic
SELECT AVG(salary) FROM employees WHERE department = 'Sales';
-- Better
SELECT AVG(COALESCE(salary, 0)) FROM employees WHERE department = 'Sales';
-- Or exclude NULLs explicitly
SELECT AVG(salary) FROM employees WHERE department = 'Sales' AND salary IS NOT NULL;
-- Wrong - will cause errors in most databases
SELECT department, name, AVG(salary)
FROM employees
GROUP BY department;
-- Correct
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
-- Dangerous - can create massive result sets
SELECT *
FROM customers c, orders o, order_items oi
WHERE c.customer_id = o.customer_id; -- Missing join to order_items
-- Correct
SELECT *
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
INNER JOIN order_items oi ON o.order_id = oi.order_id;
-- Slow - function on column prevents index usage
SELECT * FROM orders WHERE YEAR(order_date) = 2024;
-- Fast - allows index usage
SELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01';
Do:
Don't:
Sample dialogue:
"I need to find the second-highest salary. Let me think about this step by step. First, I could use a window function with ROW_NUMBER to rank salaries, or I could use a subquery to exclude the maximum. Let me start with the window function approach since it handles ties more predictably..."
SQL interviews test both technical knowledge and problem-solving ability. Success comes from:
Practice strategies:
Next learning areas:
The key to SQL interview success isn't memorizing every possible question—it's developing a systematic approach to problem-solving and clearly communicating your technical thinking. With the patterns and strategies covered here, you'll be prepared to tackle SQL challenges confidently in any data role interview.
Learning Path: Landing Your First Data Role