Master Advanced SQL Filtering and Sorting: WHERE, AND, OR, IN, BETWEEN, ORDER BY

Imagine you're analyzing customer transaction data for a multinational e-commerce platform with 50 million records. Your executive team needs answers: Which customers from high-value regions made purchases between Black Friday and Cyber Monday? How do we rank our top-spending customers by region? Which product categories are driving growth in emerging markets?

Without precise data filtering and sorting capabilities, you'd be drowning in irrelevant results, making business-critical decisions based on incomplete or misleading information. The difference between a junior analyst and a senior data professional often comes down to mastering the nuanced art of data retrieval—knowing not just which records to find, but how to find them efficiently and accurately.

This lesson takes you deep into SQL's filtering and sorting mechanisms, moving beyond basic syntax to explore the architectural decisions, performance implications, and advanced patterns that separate expert-level practitioners from the rest. You'll learn to craft queries that not only return correct results but do so efficiently at scale, understanding the internals that drive query optimization and the edge cases that can break seemingly robust code.

What you'll learn:

Master complex WHERE clause construction with multi-condition logic and operator precedence
Implement advanced filtering patterns using IN, BETWEEN, and range operations with performance considerations
Design sorting strategies that balance query performance with business requirements
Understand query execution internals and how filtering/sorting affects database performance
Handle edge cases, NULL values, and data type coercion in filtering operations
Apply enterprise-level patterns for maintainable, scalable query design

Prerequisites

This lesson assumes you're comfortable with basic SQL syntax, understand relational database concepts, and have experience writing SELECT statements. You should be familiar with database indexes conceptually and have worked with datasets larger than a few thousand records. We'll be working primarily with PostgreSQL examples, though concepts apply broadly across SQL databases.

Understanding Query Execution Context

Before diving into specific filtering and sorting techniques, you need to understand where these operations fit in the SQL execution pipeline. When you write a query with WHERE clauses and ORDER BY statements, you're providing instructions that the query optimizer will transform into an execution plan.

The typical execution order for a SELECT statement follows this logical sequence:

-- Logical processing order (not syntax order)
FROM customers c
WHERE c.registration_date >= '2023-01-01'
AND c.region IN ('North America', 'Europe') 
GROUP BY c.customer_segment
HAVING COUNT(*) > 100
SELECT c.customer_segment, AVG(c.lifetime_value)
ORDER BY AVG(c.lifetime_value) DESC
LIMIT 10;

The WHERE clause operates early in this pipeline, filtering rows before grouping or aggregation occurs. This positioning makes WHERE clauses your primary tool for query performance optimization—every row eliminated early saves processing time in subsequent operations.

Understanding this execution context becomes crucial when dealing with complex queries. Consider this scenario where you're analyzing seasonal purchasing patterns:

-- Suboptimal approach - filtering happens late
SELECT 
    customer_id,
    purchase_date,
    total_amount,
    CASE 
        WHEN EXTRACT(month FROM purchase_date) IN (11,12,1) THEN 'Holiday'
        WHEN EXTRACT(month FROM purchase_date) IN (6,7,8) THEN 'Summer'
        ELSE 'Regular'
    END as season
FROM transactions
WHERE season = 'Holiday'  -- This won't work - season doesn't exist yet
ORDER BY total_amount DESC;

This query fails because you're trying to filter on a computed column before it exists in the execution pipeline. The correct approach requires restructuring:

-- Optimal approach - filter on base columns
SELECT 
    customer_id,
    purchase_date,
    total_amount,
    'Holiday' as season
FROM transactions
WHERE EXTRACT(month FROM purchase_date) IN (11,12,1)
ORDER BY total_amount DESC;

This foundational understanding of execution order will inform every filtering and sorting decision you make in complex queries.

Advanced WHERE Clause Patterns

The WHERE clause is your primary filtering mechanism, but expert-level usage goes far beyond simple equality checks. Let's explore sophisticated patterns that handle real-world data complexity.

Multi-Condition Logic and Operator Precedence

When combining multiple conditions, operator precedence becomes critical. SQL evaluates conditions in this order: parentheses, NOT, AND, OR. Understanding this prevents subtle logic errors that can corrupt your results.

Consider this customer segmentation query:

-- Potentially incorrect due to operator precedence
SELECT customer_id, total_spent, region, account_status
FROM customers
WHERE total_spent > 10000 
   OR region = 'Premium Markets' 
   AND account_status = 'Active'
   OR account_status = 'VIP';

Without explicit parentheses, this evaluates as:

-- How SQL actually interprets the above
WHERE total_spent > 10000 
   OR (region = 'Premium Markets' AND account_status = 'Active')
   OR account_status = 'VIP';

This might include inactive high-spenders you didn't intend to capture. The correct approach uses explicit grouping:

-- Explicitly controlled logic
SELECT customer_id, total_spent, region, account_status
FROM customers
WHERE (total_spent > 10000 OR region = 'Premium Markets')
  AND (account_status = 'Active' OR account_status = 'VIP');

For complex business rules, consider building logic incrementally:

-- Enterprise customer identification with complex rules
SELECT 
    customer_id,
    company_name,
    annual_revenue,
    employee_count,
    contract_value
FROM enterprise_prospects
WHERE 
    -- Size criteria (any of these)
    (annual_revenue > 100000000 OR employee_count > 1000)
    -- AND engagement criteria (all of these)
    AND contract_value > 500000
    AND last_contact_date > CURRENT_DATE - INTERVAL '90 days'
    -- AND qualification criteria (none of these disqualifiers)
    AND NOT (
        credit_rating = 'Poor' 
        OR legal_issues = TRUE 
        OR competitor_relationship = 'Exclusive'
    );

Handling NULL Values in Complex Conditions

NULL handling in WHERE clauses often trips up even experienced developers. NULL comparisons don't behave like other value comparisons—they require special consideration.

-- This query has a subtle bug
SELECT product_id, product_name, discontinued_date
FROM products
WHERE discontinued_date > '2023-01-01'
   OR discontinued_date <= '2023-01-01';

You might expect this to return all products, but it excludes products where discontinued_date is NULL. NULL values don't satisfy either condition, requiring explicit handling:

-- Correct approach that handles NULLs explicitly
SELECT product_id, product_name, discontinued_date
FROM products
WHERE discontinued_date > '2023-01-01'
   OR discontinued_date <= '2023-01-01'
   OR discontinued_date IS NULL;

For enterprise applications, consider using COALESCE for cleaner logic:

-- Using COALESCE for default values
SELECT 
    customer_id,
    last_login_date,
    account_status
FROM users
WHERE COALESCE(last_login_date, '1900-01-01') > CURRENT_DATE - INTERVAL '30 days'
  AND account_status IN ('Active', 'Trial');

Pattern Matching and Text Filtering

When filtering text data, simple equality often isn't sufficient. SQL provides several pattern-matching tools, each with specific use cases and performance characteristics.

-- Various text filtering approaches
SELECT 
    customer_id,
    company_name,
    industry_classification
FROM customers
WHERE 
    -- Exact prefix matching (index-friendly)
    company_name LIKE 'Tech%'
    
    -- Case-insensitive search (potentially slower)
    OR UPPER(industry_classification) LIKE '%SOFTWARE%'
    
    -- Pattern with wildcards
    OR company_name SIMILAR TO '%(Inc|Corp|Ltd)\.?'
    
    -- Regular expression (PostgreSQL)
    OR company_name ~ '^[A-Z][a-z]+ (Technologies|Systems|Solutions)$';

For performance-critical applications, understand that pattern matching can bypass indexes:

-- Index-friendly patterns (can use index)
WHERE company_name LIKE 'Acme%'        -- Leading anchor
WHERE company_name >= 'M' AND company_name < 'N'  -- Range

-- Index-unfriendly patterns (full table scan)
WHERE company_name LIKE '%Corp%'       -- Interior wildcards
WHERE company_name LIKE '%Inc'         -- Trailing anchor only

Mastering IN and BETWEEN Operations

The IN and BETWEEN operators provide elegant solutions for range-based and list-based filtering, but they come with subtle performance and behavior implications that affect enterprise-scale applications.

IN Operator Deep Dive

The IN operator offers clean syntax for multiple value matching, but its performance characteristics vary significantly based on list size and data distribution.

-- Basic IN operation for regional filtering
SELECT 
    order_id,
    customer_id,
    shipping_address,
    order_total
FROM orders
WHERE shipping_country IN ('United States', 'Canada', 'Mexico', 'Brazil');

For small, static lists (under ~100 values), IN performs well and maintains readability. However, for larger lists or dynamic values, consider alternatives:

-- Large list scenario - consider EXISTS instead
SELECT o.order_id, o.customer_id, o.order_total
FROM orders o
WHERE EXISTS (
    SELECT 1 FROM high_value_regions hvr 
    WHERE hvr.country_code = o.shipping_country
);

The EXISTS approach often performs better with large lists because it can leverage indexes more effectively than IN with many values.

Dynamic IN Lists from Subqueries

IN becomes particularly powerful when combined with subqueries, allowing dynamic filtering based on computed criteria:

-- Find customers who purchased from top-performing product categories
SELECT 
    c.customer_id,
    c.customer_name,
    c.total_lifetime_value
FROM customers c
WHERE c.customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    JOIN products p ON oi.product_id = p.product_id
    WHERE p.category_id IN (
        -- Top 5 categories by revenue this quarter
        SELECT category_id
        FROM (
            SELECT 
                p.category_id,
                SUM(oi.quantity * oi.unit_price) as category_revenue
            FROM order_items oi
            JOIN products p ON oi.product_id = p.product_id
            JOIN orders o ON oi.order_id = o.order_id
            WHERE o.order_date >= DATE_TRUNC('quarter', CURRENT_DATE)
            GROUP BY p.category_id
            ORDER BY category_revenue DESC
            LIMIT 5
        ) top_categories
    )
);

This pattern enables dynamic, data-driven filtering that adapts to changing business conditions without hardcoded values.

BETWEEN Operator Nuances

BETWEEN provides inclusive range filtering, but several edge cases require careful consideration:

-- Date range filtering with timezone considerations
SELECT 
    transaction_id,
    transaction_timestamp,
    amount,
    currency
FROM financial_transactions
WHERE transaction_timestamp BETWEEN '2023-01-01 00:00:00'::timestamp 
                                AND '2023-12-31 23:59:59'::timestamp;

Warning: BETWEEN is inclusive on both ends. For date ranges spanning full days, be explicit about time components to avoid off-by-one errors.

A more robust approach for date ranges:

-- Timezone-aware date range filtering
SELECT 
    transaction_id,
    transaction_timestamp AT TIME ZONE 'UTC' as utc_timestamp,
    amount,
    currency
FROM financial_transactions
WHERE DATE(transaction_timestamp AT TIME ZONE 'America/New_York') 
      BETWEEN '2023-01-01' AND '2023-12-31';

Performance Considerations for Range Operations

BETWEEN operations can leverage indexes effectively, but query structure affects performance:

-- Index-friendly range query
SELECT * FROM sales_data
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31'
  AND amount BETWEEN 1000 AND 10000;

-- Less optimal - computed values can't use indexes
SELECT * FROM sales_data  
WHERE EXTRACT(year FROM sale_date) = 2023
  AND amount * tax_rate BETWEEN 1100 AND 11000;

When filtering on computed values is necessary, consider computed columns or expression indexes:

-- PostgreSQL expression index example
CREATE INDEX idx_sales_year ON sales_data (EXTRACT(year FROM sale_date));

-- Now this query can use the index
SELECT * FROM sales_data  
WHERE EXTRACT(year FROM sale_date) = 2023;

Advanced Sorting with ORDER BY

Sorting seems straightforward, but enterprise applications require sophisticated ordering strategies that balance performance, maintainability, and business requirements.

Multi-Column Sorting Strategies

Real-world sorting often involves multiple criteria with different priorities and directions:

-- Customer priority ranking with multiple sort criteria
SELECT 
    customer_id,
    customer_name,
    account_tier,
    total_lifetime_value,
    last_purchase_date,
    support_tickets_open
FROM customers
ORDER BY 
    -- Primary: Account tier (custom order)
    CASE account_tier 
        WHEN 'Enterprise' THEN 1
        WHEN 'Professional' THEN 2
        WHEN 'Standard' THEN 3
        WHEN 'Basic' THEN 4
        ELSE 5
    END ASC,
    -- Secondary: Value within tier (high to low)
    total_lifetime_value DESC,
    -- Tertiary: Recent activity (recent first)
    last_purchase_date DESC NULLS LAST,
    -- Final tiebreaker: Alphabetical by name
    customer_name ASC;

This approach ensures consistent, meaningful ordering that reflects business priorities. The NULLS LAST clause explicitly handles NULL values in the sort order.

Sorting Performance and Index Strategy

ORDER BY operations can be expensive without proper indexing. Understanding how indexes support sorting is crucial for performance:

-- This query benefits from a composite index
SELECT order_id, customer_id, order_date, total_amount
FROM orders
WHERE order_status = 'Completed'
ORDER BY order_date DESC, total_amount DESC
LIMIT 20;

-- Optimal index: (order_status, order_date DESC, total_amount DESC)
CREATE INDEX idx_orders_status_date_amount 
ON orders (order_status, order_date DESC, total_amount DESC);

The index order matters—columns used in WHERE clauses should come first, followed by ORDER BY columns in the same order and direction as the query.

Dynamic Sorting Patterns

Enterprise applications often require dynamic sorting based on user preferences or business logic:

-- Dynamic sorting using CASE expressions
SELECT 
    product_id,
    product_name,
    price,
    category,
    stock_quantity,
    avg_rating
FROM products
WHERE category = $1  -- Parameter for category filter
ORDER BY 
    CASE $2  -- Sort column parameter
        WHEN 'price_asc' THEN price
        WHEN 'rating_desc' THEN -avg_rating
        WHEN 'name' THEN NULL
        ELSE created_date
    END ASC,
    CASE $2
        WHEN 'price_desc' THEN -price
        WHEN 'rating_asc' THEN avg_rating
        WHEN 'name' THEN product_name
        ELSE NULL
    END ASC NULLS LAST;

This pattern allows a single query to handle multiple sorting options, though it can complicate index optimization.

Pagination and Sorting Consistency

When implementing pagination with sorting, consistency becomes critical. Users expect the same results in the same order across page requests:

-- Problematic pagination - unstable sort order
SELECT customer_id, customer_name, total_spent
FROM customers
ORDER BY total_spent DESC  -- Multiple customers might have same value
LIMIT 20 OFFSET 40;

Without a unique sort key, the same customer might appear on different pages between requests. The solution includes a unique column as a tiebreaker:

-- Stable pagination with consistent ordering
SELECT customer_id, customer_name, total_spent
FROM customers
ORDER BY total_spent DESC, customer_id ASC  -- customer_id provides uniqueness
LIMIT 20 OFFSET 40;

For high-performance pagination, consider cursor-based approaches:

-- Cursor-based pagination (more efficient for large datasets)
SELECT customer_id, customer_name, total_spent
FROM customers
WHERE (total_spent, customer_id) < ($1, $2)  -- Previous page's last values
ORDER BY total_spent DESC, customer_id ASC
LIMIT 20;

Query Optimization and Performance Analysis

Understanding how filtering and sorting operations affect query performance enables you to write efficient SQL that scales with your data.

Analyzing Query Execution Plans

Before optimizing queries, you need to understand how the database executes them. Every major SQL database provides execution plan analysis tools:

-- PostgreSQL execution plan analysis
EXPLAIN (ANALYZE, BUFFERS) 
SELECT c.customer_name, c.region, SUM(o.total_amount) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.registration_date >= '2023-01-01'
  AND o.order_status = 'Completed'
  AND c.region IN ('North America', 'Europe', 'Asia Pacific')
GROUP BY c.customer_id, c.customer_name, c.region
HAVING SUM(o.total_amount) > 10000
ORDER BY total_spent DESC;

The execution plan reveals critical performance information:

Which operations are most expensive
Whether indexes are being used effectively
How much memory and I/O the query consumes
Where the optimizer chose to filter and sort

Index Strategy for Combined Operations

Complex queries with multiple WHERE conditions and ORDER BY clauses require sophisticated indexing strategies:

-- Query requiring careful index design
SELECT 
    p.product_name,
    p.category,
    p.price,
    AVG(r.rating) as avg_rating,
    COUNT(r.review_id) as review_count
FROM products p
LEFT JOIN reviews r ON p.product_id = r.product_id
WHERE p.price BETWEEN 50 AND 500
  AND p.category IN ('Electronics', 'Home & Garden', 'Sports')
  AND (r.created_date IS NULL OR r.created_date >= CURRENT_DATE - INTERVAL '90 days')
GROUP BY p.product_id, p.product_name, p.category, p.price
HAVING COUNT(r.review_id) >= 5
ORDER BY avg_rating DESC NULLS LAST, review_count DESC;

This query benefits from multiple indexes:

-- Primary index for product filtering
CREATE INDEX idx_products_category_price 
ON products (category, price) 
WHERE category IN ('Electronics', 'Home & Garden', 'Sports');

-- Supporting index for review filtering
CREATE INDEX idx_reviews_product_date 
ON reviews (product_id, created_date)
WHERE created_date >= CURRENT_DATE - INTERVAL '365 days';

-- Consider partial indexes for frequently filtered data
CREATE INDEX idx_products_active_category_price
ON products (category, price)
WHERE active = true AND discontinued_date IS NULL;

Statistics and Query Planner Hints

Database query planners rely on statistics to make optimization decisions. Outdated statistics can lead to poor execution plans:

-- Update table statistics (PostgreSQL)
ANALYZE customers;
ANALYZE orders;

-- Force statistics update with sampling
ANALYZE products (50000);  -- Sample 50k rows for large tables

In some cases, you might need to provide hints to the optimizer:

-- PostgreSQL: Influence join order and method
SELECT /*+ USE_HASH(c o) LEADING(c o) */
    c.customer_name,
    SUM(o.total_amount)
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.region = 'North America'
GROUP BY c.customer_id, c.customer_name;

Caution: Query hints should be used sparingly and only when you understand why the optimizer is making suboptimal choices. They can become maintenance burdens and may hurt performance as data distributions change.

Data Type Considerations and Edge Cases

Different data types behave differently in filtering and sorting operations. Understanding these nuances prevents subtle bugs and performance issues.

Numeric Type Precision and Comparison

Floating-point numbers present particular challenges in filtering due to precision limitations:

-- Problematic floating-point comparison
SELECT product_id, calculated_discount
FROM products
WHERE calculated_discount = 0.1;  -- Might miss values due to precision

Use range comparisons for floating-point values:

-- Safer approach for floating-point comparisons
SELECT product_id, calculated_discount
FROM products
WHERE calculated_discount BETWEEN 0.099999 AND 0.100001;

-- Or use DECIMAL/NUMERIC for exact precision
ALTER TABLE products 
ALTER COLUMN calculated_discount TYPE NUMERIC(10,4);

Date and Time Complexities

Date and time filtering introduces timezone, precision, and calendar complications:

-- Complex date filtering with business rules
SELECT 
    order_id,
    order_timestamp,
    customer_timezone,
    business_hours_order
FROM orders
WHERE 
    -- Orders from last business quarter
    order_timestamp >= DATE_TRUNC('quarter', CURRENT_DATE - INTERVAL '3 months')
    AND order_timestamp < DATE_TRUNC('quarter', CURRENT_DATE)
    
    -- During business hours in customer's timezone
    AND EXTRACT(hour FROM (order_timestamp AT TIME ZONE customer_timezone)) 
        BETWEEN 9 AND 17
    
    -- Exclude weekends
    AND EXTRACT(dow FROM (order_timestamp AT TIME ZONE customer_timezone)) 
        BETWEEN 1 AND 5;

String Collation and Internationalization

Text sorting and comparison behavior depends on database collation settings:

-- Collation-aware text operations
SELECT customer_name, region
FROM customers
WHERE customer_name COLLATE "en_US.utf8" LIKE 'José%'
ORDER BY customer_name COLLATE "es_ES.utf8";

-- Case-insensitive comparison
SELECT * FROM products
WHERE UPPER(product_name) = UPPER('iphone 14 pro');

-- Or using collation
SELECT * FROM products
WHERE product_name ILIKE 'iphone 14 pro';  -- PostgreSQL case-insensitive LIKE

Enterprise Patterns and Best Practices

Large-scale applications require patterns that ensure maintainability, performance, and reliability across teams and time.

Parameterized Filtering for Security and Performance

Always use parameterized queries to prevent SQL injection and enable query plan reuse:

-- Secure, efficient parameterized query
PREPARE customer_analysis(date, text, numeric) AS
SELECT 
    c.customer_id,
    c.customer_name,
    c.region,
    SUM(o.total_amount) as total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= $1
  AND c.region = $2
  AND o.total_amount >= $3
GROUP BY c.customer_id, c.customer_name, c.region
ORDER BY total_spent DESC;

-- Execute with parameters
EXECUTE customer_analysis('2023-01-01', 'North America', 100.00);

Building Reusable Filter Components

Complex applications benefit from composable filtering logic:

-- Base view with common filtering logic
CREATE VIEW active_customers_base AS
SELECT 
    customer_id,
    customer_name,
    registration_date,
    last_login_date,
    account_status,
    total_lifetime_value
FROM customers
WHERE account_status IN ('Active', 'Trial', 'Premium')
  AND (last_login_date IS NULL OR last_login_date >= CURRENT_DATE - INTERVAL '180 days');

-- Specialized views building on the base
CREATE VIEW high_value_customers AS
SELECT * FROM active_customers_base
WHERE total_lifetime_value >= 10000
ORDER BY total_lifetime_value DESC;

CREATE VIEW recent_signups AS
SELECT * FROM active_customers_base
WHERE registration_date >= CURRENT_DATE - INTERVAL '30 days'
ORDER BY registration_date DESC;

Handling Large Result Sets

When queries might return millions of rows, implement safeguards and optimization strategies:

-- Defensive query with limits and warnings
WITH result_preview AS (
    SELECT 
        customer_id,
        order_count,
        total_spent
    FROM customer_summary
    WHERE last_purchase_date >= $1
      AND region = ANY($2)
    ORDER BY total_spent DESC
    LIMIT 10001  -- One more than needed to detect large result sets
)
SELECT 
    customer_id,
    order_count,
    total_spent,
    CASE 
        WHEN ROW_NUMBER() OVER() = 10001 
        THEN 'WARNING: Result set truncated - refine filters'
        ELSE NULL
    END as result_warning
FROM result_preview
WHERE ROW_NUMBER() OVER() <= 10000;

Hands-On Exercise

Now let's apply these concepts with a comprehensive exercise using a realistic e-commerce dataset.

Scenario: You're analyzing customer behavior for a global e-commerce platform. The business team needs insights about customer segments, purchasing patterns, and regional performance.

Set up the exercise environment:

-- Create sample tables
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100),
    registration_date DATE,
    country_code CHAR(2),
    region VARCHAR(50),
    account_tier VARCHAR(20),
    total_lifetime_value NUMERIC(12,2)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE,
    order_timestamp TIMESTAMP WITH TIME ZONE,
    order_status VARCHAR(20),
    total_amount NUMERIC(10,2),
    currency_code CHAR(3)
);

CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER,
    product_name VARCHAR(100),
    category VARCHAR(50),
    quantity INTEGER,
    unit_price NUMERIC(8,2)
);

-- Insert sample data (simplified for exercise)
INSERT INTO customers VALUES 
(1, 'Sarah Chen', 'sarah@email.com', '2022-03-15', 'US', 'North America', 'Premium', 15750.00),
(2, 'Marcus Rodriguez', 'marcus@email.com', '2022-08-22', 'MX', 'North America', 'Standard', 3240.00),
(3, 'Emma Thompson', 'emma@email.com', '2021-11-03', 'GB', 'Europe', 'Enterprise', 45600.00);
-- (In practice, you'd have thousands or millions of rows)

Exercise Tasks:

Advanced Customer Segmentation: Write a query that identifies "At-Risk Premium Customers" - Premium or Enterprise tier customers who haven't placed an order in the last 60 days but have a lifetime value over $10,000.
Regional Sales Analysis: Create a query that shows the top 3 product categories by revenue for each region, but only for regions where the total category revenue exceeds $100,000.
Dynamic Filtering Challenge: Write a single query that can handle different sorting options (by revenue, by customer count, by average order value) using CASE statements in the ORDER BY clause.

Solution Approaches:

-- Task 1: At-Risk Premium Customers
SELECT 
    c.customer_id,
    c.customer_name,
    c.account_tier,
    c.total_lifetime_value,
    MAX(o.order_date) as last_order_date,
    CURRENT_DATE - MAX(o.order_date) as days_since_last_order
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.account_tier IN ('Premium', 'Enterprise')
  AND c.total_lifetime_value > 10000
GROUP BY c.customer_id, c.customer_name, c.account_tier, c.total_lifetime_value
HAVING (MAX(o.order_date) IS NULL 
        OR MAX(o.order_date) < CURRENT_DATE - INTERVAL '60 days')
ORDER BY c.total_lifetime_value DESC;

-- Task 2: Regional Category Analysis with Window Functions
WITH category_revenue AS (
    SELECT 
        c.region,
        oi.category,
        SUM(oi.quantity * oi.unit_price) as category_revenue
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    JOIN order_items oi ON o.order_id = oi.order_id
    WHERE o.order_status = 'Completed'
    GROUP BY c.region, oi.category
    HAVING SUM(oi.quantity * oi.unit_price) > 100000
),
ranked_categories AS (
    SELECT 
        region,
        category,
        category_revenue,
        ROW_NUMBER() OVER (PARTITION BY region ORDER BY category_revenue DESC) as rank
    FROM category_revenue
)
SELECT 
    region,
    category,
    category_revenue
FROM ranked_categories
WHERE rank <= 3
ORDER BY region, rank;

Common Mistakes & Troubleshooting

Even experienced developers encounter subtle issues with filtering and sorting. Here are the most common problems and their solutions:

Mistake 1: Ignoring NULL Handling in Complex Conditions

Problem: Queries that should return all records miss those with NULL values.

-- Incorrect - misses NULLs
SELECT * FROM customers 
WHERE last_login_date > '2023-01-01' OR last_login_date <= '2023-01-01';

-- Correct - explicitly handles NULLs
SELECT * FROM customers 
WHERE last_login_date > '2023-01-01' 
   OR last_login_date <= '2023-01-01' 
   OR last_login_date IS NULL;

Mistake 2: Performance Issues with Leading Wildcards

Problem: Using LIKE patterns that start with wildcards forces full table scans.

-- Problematic - can't use indexes
SELECT * FROM products WHERE product_name LIKE '%Phone%';

-- Better - use full-text search capabilities
SELECT * FROM products WHERE to_tsvector('english', product_name) @@ to_tsquery('Phone');

-- Or restructure to use prefixes when possible
SELECT * FROM products WHERE product_name LIKE 'iPhone%';

Mistake 3: Inconsistent Sort Orders Due to Collation

Problem: String sorting behaves unexpectedly across different locales.

-- Problematic - depends on database default collation
SELECT customer_name FROM customers ORDER BY customer_name;

-- Explicit collation control
SELECT customer_name FROM customers 
ORDER BY customer_name COLLATE "en_US.utf8";

Mistake 4: Inefficient Date Range Queries

Problem: Using functions on date columns prevents index usage.

-- Inefficient - function prevents index use
SELECT * FROM orders WHERE EXTRACT(year FROM order_date) = 2023;

-- Efficient - uses range comparison
SELECT * FROM orders WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';

Mistake 5: Unstable Pagination Results

Problem: Pagination returns inconsistent results when sort keys aren't unique.

-- Unstable pagination
SELECT * FROM products ORDER BY price DESC LIMIT 10 OFFSET 20;

-- Stable pagination with unique tiebreaker
SELECT * FROM products ORDER BY price DESC, product_id ASC LIMIT 10 OFFSET 20;

Debugging Complex Queries

When filtering and sorting queries don't behave as expected, follow this systematic debugging approach:

Isolate components: Test WHERE clauses and ORDER BY clauses separately
Check for NULLs: Add explicit NULL checks to understand data distribution
Verify data types: Ensure comparisons happen between compatible types
Examine execution plans: Use EXPLAIN to understand how the query executes
Test with smaller datasets: Verify logic works with known data before scaling

-- Debugging template
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT 
    -- Add row counts for verification
    COUNT(*) OVER() as total_matching_rows,
    customer_id,
    customer_name
FROM customers
WHERE 
    -- Test conditions incrementally
    registration_date >= '2023-01-01'
    -- AND other_condition
ORDER BY customer_name;

Summary & Next Steps

You've now mastered the sophisticated filtering and sorting capabilities that separate expert SQL practitioners from beginners. You understand not just the syntax of WHERE clauses, IN operations, BETWEEN ranges, and ORDER BY statements, but the underlying execution patterns, performance implications, and enterprise-level design considerations that make queries robust at scale.

Key concepts you've mastered include:

Query execution context and how filtering/sorting fits in the processing pipeline
Advanced WHERE clause patterns with complex logic, NULL handling, and pattern matching
IN and BETWEEN operations with performance optimization strategies
Multi-dimensional sorting with business-rule-driven ORDER BY clauses
Query optimization techniques using execution plans and index strategies
Enterprise patterns for maintainable, secure, and scalable query design

The patterns and techniques you've learned here form the foundation for more advanced SQL topics. Your next learning priorities should include:

Immediate next steps:

Practice window functions (ROW_NUMBER, RANK, LAG/LEAD) which build on your sorting knowledge
Explore advanced join patterns and their interaction with filtering strategies
Study query optimization techniques specific to your database platform

Intermediate advancement:

Master recursive CTEs for hierarchical data filtering
Learn partition pruning strategies for very large tables
Understand query parallelization and how it affects filtering operations

Advanced specialization:

Study columnar database filtering strategies (for analytics workloads)
Explore distributed query execution across multiple database nodes
Master query federation techniques for cross-database filtering

The filtering and sorting mastery you've gained isn't just about writing correct queries—it's about building scalable data systems that perform well under real-world conditions. These skills will serve as the foundation for every complex analytical query, data pipeline, and reporting system you build throughout your data career.