Advanced SQL Filtering and Sorting: Mastering WHERE, AND, OR, IN, BETWEEN, ORDER BY

Picture this: You're analyzing customer transactions for a multinational e-commerce platform. The raw data table contains 50 million records spanning three years, with customers from 200+ countries making purchases across dozens of product categories. Your stakeholder walks in and asks: "Can you show me premium customers from North America who made purchases between $500-2000 last quarter, sorted by their total spending?"

Without proper filtering and sorting techniques, this request would be impossible to fulfill efficiently. You'd either return an overwhelming dataset that crashes their browser, or spend hours writing convoluted queries that perform poorly. But with mastery of SQL's filtering and sorting capabilities, this becomes a precise, performant query that delivers exactly what's needed.

The difference between junior and senior data professionals isn't just knowing these SQL clauses exist—it's understanding how they interact, when each approach is optimal, and how to architect queries that remain fast and maintainable at scale. This lesson will transform you from someone who writes basic WHERE clauses into someone who crafts sophisticated data retrieval strategies.

What you'll learn:

How to construct complex filtering logic that scales to millions of records
When to use different filtering operators and how they impact query performance
Advanced sorting techniques including multi-column sorts and custom ordering
Query optimization strategies for filtering and sorting large datasets
How to avoid common pitfalls that lead to incorrect results or poor performance

Prerequisites

You should be comfortable with basic SQL SELECT statements and have access to a SQL database environment (PostgreSQL, MySQL, SQL Server, or similar). Familiarity with database indexes and execution plans will help you understand the performance discussions, though we'll explain these concepts as we encounter them.

The Foundation: Understanding WHERE Clause Mechanics

The WHERE clause is SQL's primary filtering mechanism, but it's more sophisticated than most developers realize. At its core, WHERE applies a boolean expression to each row in your result set, keeping only rows where the expression evaluates to TRUE. However, the devil—and the power—lies in the details.

Let's start with a realistic dataset. We'll use a customer_transactions table that represents the e-commerce scenario mentioned earlier:

-- Sample schema for our examples
CREATE TABLE customer_transactions (
    transaction_id BIGINT PRIMARY KEY,
    customer_id INT,
    customer_country VARCHAR(50),
    customer_tier VARCHAR(20), -- 'Premium', 'Standard', 'Basic'
    transaction_date DATE,
    product_category VARCHAR(100),
    amount DECIMAL(10,2),
    discount_applied DECIMAL(4,2),
    payment_method VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Basic Filtering with Performance Implications

A simple WHERE clause might look straightforward:

SELECT customer_id, amount, transaction_date
FROM customer_transactions
WHERE customer_country = 'United States';

But even this basic query raises important questions for expert practitioners. How does the database engine execute this filter? If customer_country isn't indexed, the engine performs a full table scan, examining every single row. With our 50-million-record table, that's potentially 50 million string comparisons.

The query planner's approach depends on several factors:

Selectivity: How many rows match the condition? If 40% of customers are from the US, this query will return 20 million rows.
Index availability: An index on customer_country transforms this from O(n) to O(log n) lookup complexity.
Data distribution: Skewed data distributions can cause the optimizer to make suboptimal choices.

Let's examine how different operators affect performance and results:

-- Exact match - fastest with proper indexing
SELECT * FROM customer_transactions 
WHERE customer_id = 12345;

-- Range queries - more complex but still efficient with indexes
SELECT * FROM customer_transactions 
WHERE amount > 1000.00;

-- Pattern matching - potentially expensive
SELECT * FROM customer_transactions 
WHERE customer_country LIKE 'United%';

-- NULL handling - often misunderstood
SELECT * FROM customer_transactions 
WHERE discount_applied IS NULL;

Critical insight: The expression WHERE column = NULL never returns true, even when the column contains NULL values. Always use IS NULL or IS NOT NULL for NULL comparisons. This is one of the most common sources of missing data in query results.

Advanced String Filtering Techniques

String filtering goes beyond basic equality. For international e-commerce data, you'll encounter encoding issues, case sensitivity, and cultural variations that require sophisticated handling:

-- Case-insensitive search (approach varies by database)
-- PostgreSQL/SQL Server:
SELECT * FROM customer_transactions 
WHERE UPPER(customer_country) = 'UNITED STATES';

-- MySQL (case-insensitive by default in many configurations):
SELECT * FROM customer_transactions 
WHERE customer_country = 'united states';

-- Pattern matching with wildcards
SELECT * FROM customer_transactions 
WHERE product_category LIKE '%Electronics%'
   OR product_category LIKE '%Technology%';

-- Regular expressions (PostgreSQL syntax)
SELECT * FROM customer_transactions 
WHERE customer_country ~* '^(US|USA|United States)$';

The performance characteristics of these approaches differ dramatically. UPPER() function calls prevent index usage unless you've created a functional index. Pattern matching with leading wildcards (LIKE '%term') cannot use standard B-tree indexes effectively. Regular expressions are powerful but computationally expensive.

Mastering Logical Operators: AND, OR, and Precedence

Combining conditions is where most SQL queries gain their power—and where most developers introduce subtle bugs. Understanding operator precedence and evaluation order is crucial for writing correct and efficient queries.

AND Operator: Intersection Logic

The AND operator creates intersection conditions—all specified conditions must be true. This typically reduces your result set size:

SELECT customer_id, amount, transaction_date
FROM customer_transactions
WHERE customer_tier = 'Premium'
  AND customer_country IN ('United States', 'Canada', 'Mexico')
  AND transaction_date >= '2024-01-01'
  AND amount BETWEEN 500.00 AND 2000.00;

From a performance perspective, the database optimizer will evaluate AND conditions in an order it determines most efficient, typically starting with the most selective (the condition that eliminates the most rows). However, you can influence this by structuring your queries thoughtfully.

Consider this optimization principle: place the most selective conditions first when possible, especially in databases that don't have sophisticated optimizers:

-- More efficient: start with highly selective condition
SELECT * FROM customer_transactions
WHERE customer_id = 12345  -- Highly selective
  AND customer_country = 'United States'  -- Less selective
  AND amount > 100.00;  -- Least selective

-- Less efficient: broad condition first
SELECT * FROM customer_transactions
WHERE amount > 100.00  -- Matches most records
  AND customer_country = 'United States'
  AND customer_id = 12345;

OR Operator: Union Logic and Performance Challenges

OR conditions create union logic—any of the specified conditions can be true. This typically increases your result set size and presents unique optimization challenges:

SELECT customer_id, product_category, amount
FROM customer_transactions
WHERE product_category = 'Electronics'
   OR product_category = 'Computers'
   OR product_category = 'Mobile Devices'
   OR amount > 5000.00;

OR operations are inherently more expensive than AND operations because the database engine cannot eliminate rows as aggressively during early filtering stages. Each OR condition requires separate evaluation, and the results must be combined.

The query above might be rewritten more efficiently using IN:

SELECT customer_id, product_category, amount
FROM customer_transactions
WHERE product_category IN ('Electronics', 'Computers', 'Mobile Devices')
   OR amount > 5000.00;

Complex Logic: Precedence and Parentheses

SQL follows standard boolean algebra precedence rules: AND has higher precedence than OR. This can create unexpected results when conditions are mixed:

-- This query might not do what you think
SELECT * FROM customer_transactions
WHERE customer_tier = 'Premium' 
   OR customer_tier = 'VIP' 
   AND amount > 1000.00
   AND transaction_date >= '2024-01-01';

Due to precedence, this is equivalent to:

SELECT * FROM customer_transactions
WHERE customer_tier = 'Premium' 
   OR (customer_tier = 'VIP' 
       AND amount > 1000.00
       AND transaction_date >= '2024-01-01');

This returns ALL Premium customers regardless of amount or date, plus VIP customers who meet the additional criteria. To get Premium OR VIP customers who meet the amount and date criteria, use parentheses:

SELECT * FROM customer_transactions
WHERE (customer_tier = 'Premium' OR customer_tier = 'VIP')
  AND amount > 1000.00
  AND transaction_date >= '2024-01-01';

Expert tip: Always use parentheses when combining AND and OR, even when not strictly necessary. It makes your intent explicit and prevents future developers (including yourself) from misinterpreting the logic.

The IN Operator: Efficient Set Membership Testing

The IN operator tests whether a value exists in a specified list of values. It's essentially shorthand for multiple OR conditions, but with important performance and functionality differences.

Basic IN Usage and Optimization

-- Instead of multiple OR conditions
SELECT * FROM customer_transactions
WHERE customer_country = 'United States'
   OR customer_country = 'Canada'
   OR customer_country = 'Mexico'
   OR customer_country = 'United Kingdom'
   OR customer_country = 'Australia';

-- Use IN for cleaner, often faster queries
SELECT * FROM customer_transactions
WHERE customer_country IN ('United States', 'Canada', 'Mexico', 
                           'United Kingdom', 'Australia');

The IN operator offers several advantages:

Readability: Much cleaner syntax for multiple equality checks
Optimization potential: Some database engines can optimize IN lists better than equivalent OR chains
Maintainability: Easier to add or remove values from the list

However, IN has important limitations and performance characteristics:

IN Performance Considerations

The performance of IN operations depends heavily on the size of the value list and the presence of appropriate indexes:

-- Small lists (< 100 values) are generally efficient
SELECT * FROM customer_transactions
WHERE payment_method IN ('Credit Card', 'Debit Card', 'PayPal');

-- Large lists can become problematic
SELECT * FROM customer_transactions
WHERE customer_id IN (1, 2, 3, /* ... 10,000 more values ... */);

For large IN lists, consider these alternatives:

Temporary tables: Load values into a temp table and use JOIN
EXISTS with subqueries: Can be more efficient for complex logic
Array operations: Some databases support array-based operations

-- Alternative to large IN list: use EXISTS
SELECT t1.* 
FROM customer_transactions t1
WHERE EXISTS (
    SELECT 1 FROM target_customers t2 
    WHERE t2.customer_id = t1.customer_id
);

IN with Subqueries: Power and Pitfalls

IN becomes particularly powerful when combined with subqueries, allowing you to filter based on results from other tables:

-- Find transactions from customers who have made purchases > $10,000
SELECT * FROM customer_transactions ct
WHERE ct.customer_id IN (
    SELECT customer_id 
    FROM customer_transactions 
    GROUP BY customer_id 
    HAVING SUM(amount) > 10000.00
);

This pattern enables sophisticated filtering logic, but introduces new performance and correctness considerations:

-- Dangerous: what if the subquery returns NULL?
SELECT * FROM customer_transactions
WHERE customer_id IN (
    SELECT customer_id 
    FROM customers 
    WHERE registration_date > '2024-01-01'
);

If any customer_id in the customers table is NULL, the IN operation may not behave as expected. Some databases handle this gracefully, others don't. Always consider NULL handling in your subqueries:

-- Safer approach: explicitly handle NULLs
SELECT * FROM customer_transactions
WHERE customer_id IN (
    SELECT customer_id 
    FROM customers 
    WHERE registration_date > '2024-01-01'
      AND customer_id IS NOT NULL
);

NOT IN: The NULL Trap

NOT IN is particularly treacherous when dealing with NULL values:

-- This might return no rows if ANY value in the subquery is NULL
SELECT * FROM customer_transactions
WHERE customer_id NOT IN (
    SELECT customer_id FROM blacklisted_customers
);

In SQL's three-valued logic (TRUE, FALSE, UNKNOWN), if the subquery contains any NULL values, the NOT IN comparison becomes UNKNOWN for all rows, effectively returning no results. Use NOT EXISTS for safer logic:

-- Safer alternative to NOT IN
SELECT * FROM customer_transactions ct
WHERE NOT EXISTS (
    SELECT 1 FROM blacklisted_customers bc
    WHERE bc.customer_id = ct.customer_id
);

BETWEEN: Range Filtering with Nuanced Behavior

BETWEEN provides elegant syntax for range queries, but its behavior has subtleties that affect both correctness and performance.

Inclusive Range Logic

BETWEEN is inclusive of both endpoints:

-- This includes transactions with amount = 500.00 and amount = 2000.00
SELECT * FROM customer_transactions
WHERE amount BETWEEN 500.00 AND 2000.00;

-- Equivalent to:
SELECT * FROM customer_transactions
WHERE amount >= 500.00 AND amount <= 2000.00;

Date Range Filtering: Time Zone and Precision Issues

Date filtering with BETWEEN requires careful attention to data types and time zones:

-- Problematic for datetime fields
SELECT * FROM customer_transactions
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31';

If created_at includes time components, this query misses transactions from January 31st after midnight. The string '2024-01-31' is typically interpreted as '2024-01-31 00:00:00', excluding the rest of the day.

-- Better approach for date ranges
SELECT * FROM customer_transactions
WHERE created_at >= '2024-01-01' 
  AND created_at < '2024-02-01';

-- Or be explicit about time
SELECT * FROM customer_transactions
WHERE created_at BETWEEN '2024-01-01 00:00:00' 
                     AND '2024-01-31 23:59:59';

For databases that support date functions:

-- PostgreSQL example
SELECT * FROM customer_transactions
WHERE DATE(created_at) BETWEEN '2024-01-01' AND '2024-01-31';

-- But beware: this prevents index usage on created_at

BETWEEN Performance Characteristics

BETWEEN operations can leverage range indexes efficiently:

-- Can use index range scan on amount
SELECT * FROM customer_transactions
WHERE amount BETWEEN 1000.00 AND 5000.00;

-- Multiple BETWEEN conditions require careful index design
SELECT * FROM customer_transactions
WHERE amount BETWEEN 1000.00 AND 5000.00
  AND transaction_date BETWEEN '2024-01-01' AND '2024-01-31';

For the multi-column range query above, a composite index on (transaction_date, amount) or (amount, transaction_date) could be beneficial, depending on data distribution and query selectivity.

NOT BETWEEN: Alternative Approaches

NOT BETWEEN works as expected but sometimes alternative approaches are clearer:

-- Standard NOT BETWEEN
SELECT * FROM customer_transactions
WHERE amount NOT BETWEEN 100.00 AND 1000.00;

-- Equivalent, sometimes clearer logic
SELECT * FROM customer_transactions
WHERE amount < 100.00 OR amount > 1000.00;

ORDER BY: Advanced Sorting Strategies

Sorting seems straightforward until you're dealing with millions of records, multiple columns, custom ordering rules, and performance requirements. Expert-level sorting involves understanding how databases execute sort operations and how to optimize them.

Single Column Sorting and Index Utilization

Basic sorting leverages indexes when possible:

-- Can use index on transaction_date for efficient sorting
SELECT customer_id, amount, transaction_date
FROM customer_transactions
ORDER BY transaction_date DESC
LIMIT 1000;

The LIMIT clause is crucial here—without it, the database must sort all 50 million rows before returning results. With LIMIT, the database can use index ordering to return the top 1000 rows without sorting the entire dataset.

Multi-Column Sorting: Order Matters

Multi-column sorting is where complexity emerges:

SELECT customer_id, transaction_date, amount
FROM customer_transactions
ORDER BY customer_tier DESC, amount DESC, transaction_date DESC;

The database sorts first by customer_tier, then by amount within each tier, then by transaction_date within each amount group. The order of columns in your ORDER BY clause directly affects:

Which indexes can be used
The algorithm chosen for sorting
Memory usage during sort operations

For this query to use an index efficiently, you'd need a composite index matching the sort order:

-- Index that supports the sort operation above
CREATE INDEX idx_customer_sort 
ON customer_transactions (customer_tier DESC, amount DESC, transaction_date DESC);

Handling NULL Values in Sorting

NULL values require explicit handling in sort operations:

-- Database-specific NULL handling
-- PostgreSQL: NULLs last by default
SELECT * FROM customer_transactions
ORDER BY discount_applied DESC;

-- Explicit NULL handling
SELECT * FROM customer_transactions
ORDER BY discount_applied DESC NULLS LAST;

-- Alternative: replace NULLs for sorting
SELECT * FROM customer_transactions
ORDER BY COALESCE(discount_applied, 0) DESC;

Different databases have different default behaviors for NULL sorting:

PostgreSQL: NULLs are considered larger than non-NULL values
MySQL: NULL values sort first for ASC, last for DESC
SQL Server: NULLs are considered smaller than non-NULL values

Custom Sorting Logic

Sometimes business requirements demand custom sort orders that don't follow alphabetical or numerical patterns:

-- Priority-based custom ordering
SELECT * FROM customer_transactions
ORDER BY 
  CASE customer_tier 
    WHEN 'VIP' THEN 1
    WHEN 'Premium' THEN 2
    WHEN 'Standard' THEN 3
    WHEN 'Basic' THEN 4
    ELSE 5
  END,
  amount DESC;

This approach works but can be expensive for large datasets. Consider materializing custom sort orders:

-- Alternative: lookup table for sort orders
CREATE TABLE customer_tier_priority (
    tier VARCHAR(20),
    sort_order INT
);

INSERT INTO customer_tier_priority VALUES 
('VIP', 1), ('Premium', 2), ('Standard', 3), ('Basic', 4);

-- More efficient join-based sorting
SELECT ct.* 
FROM customer_transactions ct
JOIN customer_tier_priority ctp ON ct.customer_tier = ctp.tier
ORDER BY ctp.sort_order, ct.amount DESC;

Sorting Performance: Memory and Spill-to-Disk

Large sort operations consume significant memory. When available memory is insufficient, the database engine "spills" to disk, dramatically impacting performance:

-- This query on 50M records requires substantial memory
SELECT * FROM customer_transactions
ORDER BY amount DESC;

-- More memory-efficient with LIMIT
SELECT * FROM customer_transactions
ORDER BY amount DESC
LIMIT 10000;

-- Consider pagination for large result sets
SELECT * FROM customer_transactions
ORDER BY transaction_id  -- Use stable, indexed column
LIMIT 10000 OFFSET 20000;

Performance insight: OFFSET becomes increasingly expensive with larger values because the database must still process all skipped rows. For deep pagination, consider cursor-based pagination using WHERE clauses instead of OFFSET.

Advanced Filtering Patterns and Optimizations

Real-world data filtering often requires combinations of techniques and careful optimization strategies.

Combining Multiple Operators Effectively

Complex business logic often requires sophisticated WHERE clauses:

-- Multi-dimensional customer analysis
SELECT 
    customer_id,
    customer_tier,
    COUNT(*) as transaction_count,
    SUM(amount) as total_spent,
    AVG(amount) as avg_transaction
FROM customer_transactions
WHERE (
    -- High-value customers
    (customer_tier IN ('Premium', 'VIP') AND amount > 500.00)
    OR 
    -- Frequent standard customers
    (customer_tier = 'Standard' AND customer_id IN (
        SELECT customer_id 
        FROM customer_transactions 
        GROUP BY customer_id 
        HAVING COUNT(*) > 10
    ))
  )
  AND transaction_date BETWEEN '2024-01-01' AND '2024-03-31'
  AND customer_country IN ('United States', 'Canada', 'United Kingdom')
  AND payment_method NOT IN ('Cash', 'Check')
GROUP BY customer_id, customer_tier
HAVING SUM(amount) > 2000.00
ORDER BY total_spent DESC;

This query demonstrates several advanced concepts:

Parenthetical logic grouping
Subquery integration
Negative filtering with NOT IN
Post-aggregation filtering with HAVING
Multi-column grouping and sorting

Index Strategy for Complex Filtering

The query above benefits from multiple indexes:

-- Support date range filtering
CREATE INDEX idx_transaction_date 
ON customer_transactions (transaction_date);

-- Support customer analysis
CREATE INDEX idx_customer_analysis 
ON customer_transactions (customer_tier, amount, customer_country);

-- Support frequent customer detection
CREATE INDEX idx_customer_frequency 
ON customer_transactions (customer_id, transaction_date);

However, too many indexes can hurt INSERT/UPDATE performance. Consider composite indexes that support multiple query patterns:

-- Multi-purpose composite index
CREATE INDEX idx_customer_comprehensive 
ON customer_transactions (
    transaction_date, 
    customer_tier, 
    customer_country, 
    customer_id, 
    amount
);

Window Functions vs. Subqueries for Filtering

Sometimes window functions provide more elegant solutions than complex subqueries:

-- Traditional subquery approach
SELECT * FROM customer_transactions ct1
WHERE amount > (
    SELECT AVG(amount) 
    FROM customer_transactions ct2 
    WHERE ct2.customer_tier = ct1.customer_tier
);

-- Window function approach (often more efficient)
WITH transaction_analysis AS (
    SELECT *,
           AVG(amount) OVER (PARTITION BY customer_tier) as tier_avg
    FROM customer_transactions
)
SELECT * FROM transaction_analysis
WHERE amount > tier_avg;

The window function approach often performs better because it scans the base table only once, whereas the correlated subquery approach scans it once per row.

Filtering with Regular Expressions and Advanced Pattern Matching

For complex text analysis, regular expressions provide powerful filtering capabilities:

-- PostgreSQL regular expression examples
SELECT * FROM customer_transactions
WHERE product_category ~ '^(Electronics|Computers|Mobile)'
  AND customer_country ~* '^united'  -- Case-insensitive
  AND payment_method !~ 'card$';     -- Doesn't end with 'card'

-- Extract and filter based on patterns
SELECT *, 
       SUBSTRING(product_category FROM '\d+') as extracted_number
FROM customer_transactions
WHERE product_category ~ '\d+(?:GB|TB)$'  -- Memory/storage products
ORDER BY CAST(SUBSTRING(product_category FROM '\d+') AS INTEGER) DESC;

Regular expressions are powerful but computationally expensive. Use them judiciously, especially on large datasets.

Hands-On Exercise: Building a Comprehensive Customer Analytics Query

Let's apply everything we've learned to solve a realistic business problem. You're tasked with creating a customer segmentation report with the following requirements:

Premium customers: VIP or Premium tier with total spending > $5,000
Frequent buyers: Standard customers with > 15 transactions
Big spenders: Any customer with individual transactions > $2,000
Time frame: Last 6 months only
Geographic focus: North America and Europe
Payment methods: Exclude cash and check payments
Results: Show customer summary with ranking

Here's a step-by-step solution:

-- Step 1: Define our geographic regions
WITH regional_customers AS (
    SELECT customer_id, customer_tier, customer_country,
           transaction_date, amount, payment_method,
           CASE 
               WHEN customer_country IN ('United States', 'Canada', 'Mexico') 
               THEN 'North America'
               WHEN customer_country IN ('United Kingdom', 'Germany', 'France', 
                                        'Italy', 'Spain', 'Netherlands') 
               THEN 'Europe'
               ELSE 'Other'
           END as region
    FROM customer_transactions
    WHERE transaction_date >= CURRENT_DATE - INTERVAL '6 months'
      AND customer_country IN ('United States', 'Canada', 'Mexico',
                               'United Kingdom', 'Germany', 'France', 
                               'Italy', 'Spain', 'Netherlands')
      AND payment_method NOT IN ('Cash', 'Check')
),

-- Step 2: Calculate customer metrics
customer_metrics AS (
    SELECT customer_id, customer_tier, region,
           COUNT(*) as transaction_count,
           SUM(amount) as total_spent,
           AVG(amount) as avg_transaction,
           MAX(amount) as max_transaction,
           MIN(transaction_date) as first_transaction,
           MAX(transaction_date) as last_transaction
    FROM regional_customers
    GROUP BY customer_id, customer_tier, region
),

-- Step 3: Apply segmentation logic
customer_segments AS (
    SELECT *,
           CASE 
               WHEN customer_tier IN ('VIP', 'Premium') AND total_spent > 5000
               THEN 'Premium Customer'
               WHEN customer_tier = 'Standard' AND transaction_count > 15
               THEN 'Frequent Buyer'
               WHEN max_transaction > 2000
               THEN 'Big Spender'
               ELSE 'Regular Customer'
           END as segment,
           
           -- Create comprehensive ranking
           ROW_NUMBER() OVER (
               ORDER BY 
                   CASE customer_tier 
                       WHEN 'VIP' THEN 1 
                       WHEN 'Premium' THEN 2 
                       WHEN 'Standard' THEN 3 
                       ELSE 4 
                   END,
                   total_spent DESC,
                   transaction_count DESC
           ) as customer_rank
    FROM customer_metrics
)

-- Final results with additional analytics
SELECT customer_id,
       customer_tier,
       region,
       segment,
       customer_rank,
       transaction_count,
       ROUND(total_spent, 2) as total_spent,
       ROUND(avg_transaction, 2) as avg_transaction,
       max_transaction,
       first_transaction,
       last_transaction,
       
       -- Additional insights
       ROUND(total_spent / transaction_count, 2) as calculated_avg,
       CASE 
           WHEN CURRENT_DATE - last_transaction <= 30 THEN 'Active'
           WHEN CURRENT_DATE - last_transaction <= 90 THEN 'Recent'
           ELSE 'Inactive'
       END as activity_status,
       
       -- Percentile ranking within segment
       PERCENT_RANK() OVER (
           PARTITION BY segment 
           ORDER BY total_spent
       ) as segment_percentile
       
FROM customer_segments
WHERE segment != 'Regular Customer'  -- Focus on special segments
ORDER BY 
    CASE segment
        WHEN 'Premium Customer' THEN 1
        WHEN 'Big Spender' THEN 2
        WHEN 'Frequent Buyer' THEN 3
    END,
    total_spent DESC;

This query demonstrates:

Complex CTEs: Breaking down logic into manageable steps
Advanced CASE statements: Multi-level conditional logic
Window functions: Ranking and percentile calculations
Date arithmetic: Dynamic date filtering and recency analysis
Composite sorting: Multi-criteria ordering with custom priorities

Common Mistakes & Troubleshooting

Mistake 1: Ignoring NULL Behavior

-- Problem: Missing records where discount_applied is NULL
SELECT * FROM customer_transactions
WHERE discount_applied > 0 OR discount_applied <= 0;

-- Solution: Explicitly handle NULLs
SELECT * FROM customer_transactions
WHERE discount_applied > 0 
   OR discount_applied <= 0 
   OR discount_applied IS NULL;

Mistake 2: Inefficient OR Conditions

-- Problem: Forces full table scan
SELECT * FROM customer_transactions
WHERE UPPER(customer_country) = 'UNITED STATES'
   OR UPPER(customer_country) = 'CANADA';

-- Solution: Use IN with consistent casing
SELECT * FROM customer_transactions
WHERE customer_country IN ('United States', 'Canada');

Mistake 3: Dangerous Date Range Logic

-- Problem: Misses transactions later in the day
SELECT * FROM customer_transactions
WHERE DATE(created_at) = '2024-01-15';

-- Solution: Use proper range filtering
SELECT * FROM customer_transactions
WHERE created_at >= '2024-01-15'
  AND created_at < '2024-01-16';

Mistake 4: Suboptimal LIMIT Usage

-- Problem: Processes all records before limiting
SELECT * FROM customer_transactions
WHERE amount > 100
ORDER BY RAND()
LIMIT 10;

-- Better: Use deterministic ordering when possible
SELECT * FROM customer_transactions
WHERE amount > 100
ORDER BY transaction_id DESC
LIMIT 10;

Troubleshooting Performance Issues

When queries run slowly, follow this diagnostic approach:

Check execution plans: Use EXPLAIN or equivalent to understand query execution
Identify missing indexes: Look for table scans on large tables
Analyze WHERE clause selectivity: Ensure most selective conditions are indexed
Review data types: Ensure consistent types in comparisons
Consider query rewriting: Sometimes equivalent logic performs differently

-- Check query execution plan (PostgreSQL syntax)
EXPLAIN (ANALYZE, BUFFERS) 
SELECT * FROM customer_transactions
WHERE customer_tier = 'Premium'
  AND amount BETWEEN 500 AND 2000
ORDER BY transaction_date DESC;

Summary & Next Steps

You've now mastered the sophisticated art of filtering and sorting data in SQL. You understand how WHERE clauses work internally, how to construct complex logical conditions with AND, OR, and parentheses, and how to use IN, BETWEEN, and ORDER BY for maximum efficiency and correctness.

Key takeaways from this lesson:

Performance matters: Every filtering and sorting decision affects query performance, especially with large datasets
NULL handling: Always consider how NULL values affect your logic and results
Index strategy: Design indexes that support your most common filtering and sorting patterns
Query optimization: Use execution plans to understand and optimize query performance
Business logic complexity: Real-world requirements often demand sophisticated combinations of filtering techniques

Immediate next steps:

Apply these techniques to your own datasets and measure performance improvements
Experiment with execution plans to understand how different approaches affect query performance
Practice writing complex WHERE clauses that combine multiple operators and conditions

Advanced topics to explore:

Partitioning strategies: How table partitioning affects filtering performance
Advanced indexing: Partial indexes, expression indexes, and covering indexes
Query optimization: Understanding cost-based optimizers and query hints
Statistical analysis: Using filtering and sorting for data quality assessment and statistical analysis

The techniques you've learned here form the foundation for advanced SQL topics like window functions, CTEs, and complex analytical queries. Master these fundamentals, and you'll be ready to tackle any data filtering or sorting challenge that comes your way.