Advanced SQL Filtering and Sorting: WHERE, AND, OR, IN, BETWEEN, ORDER BY for Data Professionals

Data analysts spend roughly 80% of their time preparing and filtering data before any meaningful analysis can begin. Yet most professionals treat SQL's filtering and sorting capabilities as simple, straightforward operations. This perspective costs them hours of debugging, performance headaches, and missed insights buried in their datasets.

Consider this scenario: You're analyzing customer behavior for an e-commerce platform with 50 million transaction records. Marketing wants to identify high-value customers from the Pacific timezone who made purchases between Black Friday and Cyber Monday, excluding refunded transactions, sorted by total spend. A naive approach might scan the entire dataset multiple times or produce unexpected results due to null handling. An expert approach will execute in seconds and handle edge cases gracefully.

This lesson transforms you from someone who writes basic WHERE clauses to someone who architects sophisticated data retrieval strategies. You'll understand not just the syntax, but the query execution internals, performance implications, and subtle behaviors that separate competent SQL writers from true data engineering professionals.

What you'll learn:

How SQL's logical evaluation order affects filter performance and results
Advanced pattern matching techniques using IN, BETWEEN, and complex logical combinations
Query optimizer behavior and indexing strategies for filtering operations
Null handling nuances that break production queries
Sorting performance characteristics and memory management
Enterprise-scale filtering patterns for massive datasets

Prerequisites

You should have solid SQL fundamentals including basic SELECT statements, table joins, and familiarity with common data types. Experience with at least one major database system (PostgreSQL, MySQL, SQL Server, or Oracle) is essential, as this lesson includes database-specific optimizations and behaviors.

The Anatomy of Data Filtering: Beyond Basic WHERE

Most SQL tutorials teach WHERE clauses as simple equality checks: WHERE customer_id = 123. But enterprise data filtering requires understanding how databases evaluate complex logical expressions and optimize access paths.

Let's start with a realistic dataset. We're working with an e-commerce platform's core tables:

-- Customer demographics and segmentation
CREATE TABLE customers (
    customer_id BIGINT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    registration_date DATE,
    last_login_date TIMESTAMP,
    customer_tier VARCHAR(20), -- 'bronze', 'silver', 'gold', 'platinum'
    timezone VARCHAR(50),
    is_active BOOLEAN,
    lifetime_value DECIMAL(12,2)
);

-- Transaction records
CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id BIGINT,
    order_date TIMESTAMP,
    order_status VARCHAR(20), -- 'pending', 'shipped', 'delivered', 'refunded'
    order_total DECIMAL(10,2),
    shipping_cost DECIMAL(8,2),
    tax_amount DECIMAL(8,2),
    payment_method VARCHAR(50),
    region VARCHAR(100)
);

-- Product catalog with categories
CREATE TABLE products (
    product_id BIGINT PRIMARY KEY,
    product_name VARCHAR(255),
    category VARCHAR(100),
    subcategory VARCHAR(100),
    price DECIMAL(10,2),
    cost DECIMAL(10,2),
    launch_date DATE,
    is_discontinued BOOLEAN
);

Now let's examine how databases actually process filtering operations. When you write:

SELECT customer_id, email, lifetime_value
FROM customers 
WHERE customer_tier = 'gold' 
  AND is_active = true 
  AND lifetime_value > 1000;

The database doesn't evaluate these conditions left-to-right as written. Instead, the query optimizer analyzes statistics, available indexes, and data distribution to determine the most efficient evaluation order. Understanding this process is crucial for writing performant filters.

Logical Operator Precedence and Evaluation Strategy

SQL follows specific precedence rules that can produce surprising results. Consider this deceptively simple query:

SELECT *
FROM orders
WHERE order_status = 'delivered' 
   OR order_status = 'shipped' 
  AND order_total > 500;

Due to operator precedence (AND binds tighter than OR), this actually executes as:

SELECT *
FROM orders
WHERE order_status = 'delivered' 
   OR (order_status = 'shipped' AND order_total > 500);

This returns ALL delivered orders regardless of amount, plus shipped orders over $500. If you intended to find delivered OR shipped orders over $500, you need explicit parentheses:

SELECT *
FROM orders
WHERE (order_status = 'delivered' OR order_status = 'shipped') 
  AND order_total > 500;

Short-Circuit Evaluation and Performance Implications

Modern databases use short-circuit evaluation for logical operators. With AND, if the first condition is false, subsequent conditions aren't evaluated. With OR, if the first condition is true, evaluation stops. This behavior has significant performance implications:

-- Inefficient: expensive function called for every row
SELECT *
FROM customers
WHERE EXTRACT(YEAR FROM registration_date) = 2023
  AND customer_tier = 'platinum';

-- Better: filter on indexed column first
SELECT *
FROM customers  
WHERE customer_tier = 'platinum'
  AND EXTRACT(YEAR FROM registration_date) = 2023;

Place the most selective conditions (those that eliminate the most rows) first, especially if they can use indexes effectively.

Mastering Complex Filtering Patterns

The IN Operator: More Than List Membership

The IN operator seems straightforward, but it has nuanced behaviors that affect both correctness and performance:

-- Basic usage
SELECT * 
FROM products
WHERE category IN ('Electronics', 'Clothing', 'Books');

-- Equivalent to multiple OR conditions
SELECT *
FROM products  
WHERE category = 'Electronics' 
   OR category = 'Clothing'
   OR category = 'Books';

However, IN with subqueries introduces complexity around null handling and performance:

-- Find customers who have placed orders
SELECT customer_id, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_status != 'refunded'
);

This subquery approach can be inefficient for large datasets. A JOIN often performs better:

-- More efficient equivalent
SELECT DISTINCT c.customer_id, c.email
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_status != 'refunded';

IN Operator Gotchas: Null Values and Performance

The IN operator has counterintuitive behavior with null values:

-- This will NOT match rows where category is NULL
SELECT * FROM products WHERE category IN ('Electronics', NULL);

-- This will also NOT match rows where category is NULL  
SELECT * FROM products WHERE category NOT IN ('Electronics', NULL);

When ANY value in the IN list is NULL, NOT IN returns no rows at all due to three-valued logic. This behavior has broken countless production queries:

-- Dangerous: if ANY customer_id in subquery is NULL, returns empty result
SELECT *
FROM products
WHERE product_id NOT IN (
    SELECT product_id FROM order_items WHERE customer_id = 123
);

-- Safe: explicitly handle nulls
SELECT *
FROM products  
WHERE product_id NOT IN (
    SELECT product_id 
    FROM order_items 
    WHERE customer_id = 123 
      AND product_id IS NOT NULL
);

BETWEEN: Range Queries and Boundary Behavior

BETWEEN provides inclusive range filtering, but its behavior with different data types requires careful consideration:

-- Date ranges: inclusive on both ends
SELECT *
FROM orders
WHERE order_date BETWEEN '2023-11-24' AND '2023-11-27';

With timestamps, BETWEEN can miss records:

-- This misses orders placed on 2023-11-27 after midnight
SELECT *
FROM orders
WHERE order_date BETWEEN '2023-11-24 00:00:00' AND '2023-11-27 00:00:00';

-- Better: use next day exclusive
SELECT *
FROM orders
WHERE order_date >= '2023-11-24 00:00:00' 
  AND order_date < '2023-11-28 00:00:00';

For numeric ranges, BETWEEN is inclusive, but floating-point precision can cause issues:

-- May miss values due to floating-point representation
SELECT *
FROM products
WHERE price BETWEEN 19.99 AND 29.99;

-- More precise for monetary calculations
SELECT *
FROM products
WHERE price >= 19.99 AND price <= 29.99;

Pattern Matching with LIKE and Regular Expressions

Basic LIKE patterns use % (zero or more characters) and _ (exactly one character):

-- Find products with "smart" in the name
SELECT *
FROM products
WHERE product_name LIKE '%smart%';

-- Find products with 4-character codes
SELECT *
FROM products
WHERE product_id LIKE 'PR__';

However, LIKE performance degrades significantly with leading wildcards:

-- Terrible performance: can't use index
SELECT * FROM products WHERE product_name LIKE '%phone%';

-- Much better: can use index
SELECT * FROM products WHERE product_name LIKE 'iPhone%';

For advanced pattern matching, many databases support regular expressions:

-- PostgreSQL regex: find products with model numbers
SELECT *
FROM products
WHERE product_name ~ '[A-Z]{2,3}-[0-9]{3,4}';

-- MySQL equivalent
SELECT *
FROM products  
WHERE product_name REGEXP '[A-Z]{2,3}-[0-9]{3,4}';

Advanced Logical Combinations and Query Optimization

Enterprise filtering often requires complex logical combinations that push SQL's capabilities:

-- Multi-dimensional customer segmentation
SELECT 
    customer_id,
    email,
    customer_tier,
    lifetime_value,
    CASE 
        WHEN customer_tier = 'platinum' AND lifetime_value > 10000 THEN 'VIP'
        WHEN customer_tier IN ('gold', 'platinum') AND lifetime_value > 5000 THEN 'Premium'
        WHEN is_active = true AND last_login_date > CURRENT_DATE - INTERVAL '30 days' THEN 'Active'
        ELSE 'Standard'
    END as segment
FROM customers
WHERE (
    -- High-value customers
    (customer_tier = 'platinum' AND lifetime_value > 5000)
    OR 
    -- Active gold customers
    (customer_tier = 'gold' AND is_active = true AND last_login_date > CURRENT_DATE - INTERVAL '60 days')
    OR
    -- Recent high-spending customers regardless of tier
    (lifetime_value > 8000 AND registration_date > CURRENT_DATE - INTERVAL '1 year')
)
AND timezone IN ('America/Los_Angeles', 'America/Denver', 'America/Chicago')
AND email NOT LIKE '%test%'
AND email NOT LIKE '%@example.com';

This query demonstrates several advanced concepts:

Nested logical groupings with parentheses for precise control
Mixed condition types (equality, ranges, patterns, lists)
Date arithmetic with proper interval handling
Exclusion patterns to filter test data

Query Plan Analysis and Index Utilization

Understanding how your database executes complex filters is essential for optimization. Let's analyze query execution:

-- PostgreSQL: View execution plan
EXPLAIN (ANALYZE, BUFFERS) 
SELECT customer_id, email, lifetime_value
FROM customers
WHERE customer_tier = 'gold' 
  AND is_active = true 
  AND lifetime_value > 1000
  AND timezone = 'America/New_York';

The execution plan reveals crucial information:

Seq Scan vs Index Scan: Sequential scans read entire tables; index scans are much faster
Filter conditions: Shows which WHERE conditions can use indexes
Estimated vs Actual rows: Indicates if statistics are accurate
Buffer usage: Shows I/O requirements

For optimal performance, create composite indexes matching your common filter patterns:

-- Composite index for the above query
CREATE INDEX idx_customers_filtering 
ON customers (customer_tier, is_active, timezone) 
INCLUDE (lifetime_value);

This index allows the database to quickly find gold, active customers in New York, then filter by lifetime value without additional table lookups.

Sorting Strategies: ORDER BY Performance and Behavior

Sorting appears simple but has complex performance characteristics, especially with large datasets. Let's explore sophisticated sorting patterns:

Basic Sorting and Column Selection

-- Single column ascending (default)
SELECT customer_id, email, lifetime_value
FROM customers
ORDER BY lifetime_value;

-- Multiple columns with mixed directions
SELECT customer_id, email, customer_tier, lifetime_value
FROM customers
ORDER BY customer_tier DESC, lifetime_value ASC;

The order of columns in ORDER BY matters significantly. The database sorts by the first column, then breaks ties using subsequent columns:

-- First by tier (platinum, gold, silver, bronze), then by value within each tier
SELECT customer_id, customer_tier, lifetime_value
FROM customers
ORDER BY 
    CASE customer_tier
        WHEN 'platinum' THEN 1
        WHEN 'gold' THEN 2  
        WHEN 'silver' THEN 3
        WHEN 'bronze' THEN 4
        ELSE 5
    END,
    lifetime_value DESC;

Null Handling in Sorting

Different databases handle NULL values differently in ORDER BY:

-- PostgreSQL: NULLs sort last by default
SELECT customer_id, last_login_date
FROM customers
ORDER BY last_login_date; -- NULLs at end

-- Force NULLs first
SELECT customer_id, last_login_date  
FROM customers
ORDER BY last_login_date NULLS FIRST;

-- SQL Server: NULLs sort first by default
-- MySQL: NULLs sort first by default
-- Oracle: NULLs sort last by default

For portable code, explicitly specify NULL behavior:

-- Consistent across databases
SELECT customer_id, last_login_date
FROM customers  
ORDER BY 
    CASE WHEN last_login_date IS NULL THEN 1 ELSE 0 END,
    last_login_date DESC;

Performance Optimization for Large Sorts

Sorting large datasets can consume enormous memory and processing time. Several strategies optimize sort performance:

1. Index-based sorting:

-- Create index matching sort order
CREATE INDEX idx_customers_tier_value 
ON customers (customer_tier, lifetime_value DESC);

-- This query can use index order, avoiding explicit sort
SELECT customer_id, customer_tier, lifetime_value
FROM customers
WHERE customer_tier IN ('gold', 'platinum')
ORDER BY customer_tier, lifetime_value DESC;

2. Limit early to reduce sort overhead:

-- Inefficient: sorts all rows, then limits
SELECT customer_id, lifetime_value
FROM customers
ORDER BY lifetime_value DESC
LIMIT 100;

-- Better with appropriate index
CREATE INDEX idx_customers_lifetime_value ON customers (lifetime_value DESC);

3. Partition sorting for massive datasets:

-- Use window functions to sort within partitions
SELECT 
    customer_id,
    customer_tier,
    lifetime_value,
    ROW_NUMBER() OVER (
        PARTITION BY customer_tier 
        ORDER BY lifetime_value DESC
    ) as tier_rank
FROM customers
WHERE customer_tier IN ('gold', 'platinum')
ORDER BY customer_tier, tier_rank;

Advanced Sorting Patterns

Conditional sorting adapts order based on data values:

-- Sort completed orders by date, pending orders by priority
SELECT 
    order_id,
    order_status,
    order_date,
    priority_score
FROM orders
ORDER BY
    CASE 
        WHEN order_status = 'pending' THEN priority_score
        ELSE EXTRACT(EPOCH FROM order_date)
    END DESC;

Custom sort orders handle business-specific sequences:

-- Sort by custom business priority
SELECT region, total_sales
FROM regional_sales
ORDER BY 
    CASE region
        WHEN 'North America' THEN 1
        WHEN 'Europe' THEN 2  
        WHEN 'Asia Pacific' THEN 3
        WHEN 'Latin America' THEN 4
        ELSE 5
    END,
    total_sales DESC;

Sorting expressions enable complex ordering logic:

-- Sort customers by engagement score (combination of factors)
SELECT 
    customer_id,
    customer_tier,
    lifetime_value,
    last_login_date,
    -- Complex engagement calculation
    (lifetime_value * 0.4 + 
     CASE customer_tier
         WHEN 'platinum' THEN 1000
         WHEN 'gold' THEN 500
         WHEN 'silver' THEN 200
         ELSE 0
     END +
     CASE 
         WHEN last_login_date > CURRENT_DATE - INTERVAL '7 days' THEN 300
         WHEN last_login_date > CURRENT_DATE - INTERVAL '30 days' THEN 100
         ELSE 0
     END
    ) as engagement_score
FROM customers
ORDER BY engagement_score DESC;

Memory Management and Sort Performance

Large sorts require careful memory management. Databases use various algorithms depending on data size and available memory:

In-memory sorts: Fast but limited by available RAM
External sorts: Use disk storage, much slower but handle unlimited data
Index-based retrieval: Avoid sorting entirely by using pre-sorted indexes

Monitor sort memory usage and tune database parameters:

-- PostgreSQL: Increase sort memory for session
SET work_mem = '256MB';

-- Check if sorts are spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM large_table ORDER BY some_column;
-- Look for "Disk:" in Sort node output

Combining Filtering and Sorting: Integrated Optimization

The most sophisticated queries combine complex filtering with intelligent sorting. The query optimizer must balance filter selectivity with sort efficiency:

-- Complex e-commerce customer analysis
WITH customer_metrics AS (
    SELECT 
        c.customer_id,
        c.email,
        c.customer_tier,
        c.registration_date,
        c.timezone,
        c.lifetime_value,
        COUNT(o.order_id) as order_count,
        AVG(o.order_total) as avg_order_value,
        MAX(o.order_date) as last_order_date,
        SUM(CASE WHEN o.order_status = 'refunded' THEN 1 ELSE 0 END) as refund_count
    FROM customers c
    LEFT JOIN orders o ON c.customer_id = o.customer_id
    WHERE c.registration_date >= '2022-01-01'
      AND c.is_active = true
      AND c.timezone IN (
          'America/Los_Angeles', 'America/Denver', 
          'America/Chicago', 'America/New_York'
      )
    GROUP BY c.customer_id, c.email, c.customer_tier, 
             c.registration_date, c.timezone, c.lifetime_value
)
SELECT 
    customer_id,
    email,
    customer_tier,
    lifetime_value,
    order_count,
    avg_order_value,
    last_order_date,
    refund_count,
    -- Risk score calculation
    CASE 
        WHEN refund_count::FLOAT / NULLIF(order_count, 0) > 0.2 THEN 'High Risk'
        WHEN refund_count::FLOAT / NULLIF(order_count, 0) > 0.1 THEN 'Medium Risk'
        ELSE 'Low Risk'
    END as risk_category,
    -- Customer value segment
    CASE 
        WHEN lifetime_value > 10000 AND avg_order_value > 500 THEN 'VIP'
        WHEN lifetime_value > 5000 OR avg_order_value > 300 THEN 'Premium'  
        WHEN order_count >= 10 AND refund_count <= 1 THEN 'Loyal'
        ELSE 'Standard'
    END as value_segment
FROM customer_metrics
WHERE order_count >= 3  -- Customers with meaningful purchase history
  AND last_order_date > CURRENT_DATE - INTERVAL '6 months'  -- Recent activity
  AND (
      lifetime_value > 2000 OR  -- High value customers
      (order_count >= 5 AND avg_order_value > 100) OR  -- Frequent purchasers
      customer_tier IN ('gold', 'platinum')  -- Premium tiers
  )
ORDER BY 
    CASE value_segment
        WHEN 'VIP' THEN 1
        WHEN 'Premium' THEN 2
        WHEN 'Loyal' THEN 3
        ELSE 4
    END,
    lifetime_value DESC,
    last_order_date DESC;

This query demonstrates enterprise-level complexity:

CTE for complex aggregations before filtering
Multi-level filtering logic with business rules
Calculated fields for segmentation
Custom sort order by business priority
Risk assessment integrated into selection

Database-Specific Optimizations and Behaviors

Different database systems have unique optimization features and behaviors:

PostgreSQL Advanced Features

-- Partial indexes for specific filtering patterns
CREATE INDEX idx_active_premium_customers 
ON customers (customer_tier, lifetime_value)
WHERE is_active = true AND customer_tier IN ('gold', 'platinum');

-- Expression indexes for computed filters
CREATE INDEX idx_customers_year_registered 
ON customers (EXTRACT(YEAR FROM registration_date));

-- Using PostgreSQL arrays for efficient IN operations
SELECT * 
FROM products
WHERE category = ANY(ARRAY['Electronics', 'Clothing', 'Books']);

SQL Server Optimizations

-- Columnstore indexes for analytical queries
CREATE NONCLUSTERED COLUMNSTORE INDEX ix_orders_analytics
ON orders (customer_id, order_date, order_total, order_status);

-- Filtered indexes (similar to PostgreSQL partial indexes)
CREATE INDEX ix_active_customers
ON customers (customer_tier, lifetime_value)
WHERE is_active = 1;

-- Query hints for complex queries
SELECT customer_id, lifetime_value
FROM customers WITH (INDEX(ix_customers_tier_value))
WHERE customer_tier = 'gold'
ORDER BY lifetime_value DESC;

MySQL Performance Features

-- Composite indexes with proper column ordering
CREATE INDEX idx_orders_compound 
ON orders (order_status, order_date, customer_id);

-- Using MySQL's LIMIT optimization
SELECT customer_id, lifetime_value
FROM customers  
WHERE customer_tier = 'platinum'
ORDER BY lifetime_value DESC
LIMIT 100;

-- Force index usage when optimizer chooses poorly
SELECT customer_id, email
FROM customers FORCE INDEX (idx_customers_tier)
WHERE customer_tier = 'gold';

Hands-On Exercise

Let's apply these concepts in a comprehensive exercise. You're analyzing customer behavior for a subscription service with these requirements:

Find active customers in North American timezones
Who have subscribed in the last 18 months
With either high lifetime value (>$5000) or frequent usage (>50 logins)
Excluding test accounts and internal users
Sort by engagement score (custom calculation)
Handle edge cases like null values and inactive subscriptions

-- Sample data setup
CREATE TABLE subscribers (
    subscriber_id BIGINT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    subscription_date TIMESTAMP,
    last_login_date TIMESTAMP,  
    subscription_status VARCHAR(20), -- 'active', 'suspended', 'cancelled'
    lifetime_value DECIMAL(10,2),
    login_count INTEGER,
    timezone VARCHAR(50),
    account_type VARCHAR(20), -- 'standard', 'premium', 'enterprise', 'test', 'internal'
    referral_source VARCHAR(100)
);

-- Your task: Write the complete query
-- Include appropriate indexing strategy
-- Handle all edge cases mentioned
-- Optimize for performance with large datasets

-- Solution approach:
WITH north_american_timezones AS (
    SELECT timezone 
    FROM (VALUES 
        ('America/New_York'), ('America/Chicago'), 
        ('America/Denver'), ('America/Los_Angeles'),
        ('America/Phoenix'), ('America/Anchorage')
    ) AS t(timezone)
),
qualified_subscribers AS (
    SELECT 
        s.subscriber_id,
        s.email,
        s.subscription_date,
        s.last_login_date,
        s.lifetime_value,
        s.login_count,
        s.timezone,
        s.account_type,
        -- Engagement score calculation
        COALESCE(s.lifetime_value, 0) * 0.6 + 
        COALESCE(s.login_count, 0) * 10 +
        CASE 
            WHEN s.last_login_date > CURRENT_DATE - INTERVAL '7 days' THEN 500
            WHEN s.last_login_date > CURRENT_DATE - INTERVAL '30 days' THEN 200
            WHEN s.last_login_date > CURRENT_DATE - INTERVAL '90 days' THEN 50
            ELSE 0
        END as engagement_score
    FROM subscribers s
    WHERE s.subscription_status = 'active'
      AND s.timezone IN (SELECT timezone FROM north_american_timezones)
      AND s.subscription_date >= CURRENT_DATE - INTERVAL '18 months'
      AND s.subscription_date IS NOT NULL
      AND s.account_type NOT IN ('test', 'internal')
      AND s.email NOT LIKE '%test%'
      AND s.email NOT LIKE '%@company.com'  -- Exclude internal emails
      AND (
          COALESCE(s.lifetime_value, 0) > 5000 OR
          COALESCE(s.login_count, 0) > 50
      )
)
SELECT 
    subscriber_id,
    email,
    subscription_date,
    last_login_date,
    lifetime_value,
    login_count,
    timezone,
    account_type,
    engagement_score,
    -- Add percentile ranking
    PERCENT_RANK() OVER (ORDER BY engagement_score) as engagement_percentile
FROM qualified_subscribers
ORDER BY engagement_score DESC, subscriber_id;

-- Recommended indexes for this query
CREATE INDEX idx_subscribers_filtering 
ON subscribers (subscription_status, timezone, account_type, subscription_date)
INCLUDE (lifetime_value, login_count, last_login_date);

CREATE INDEX idx_subscribers_engagement 
ON subscribers (lifetime_value, login_count) 
WHERE subscription_status = 'active' 
  AND account_type NOT IN ('test', 'internal');

Work through this exercise, then modify it to:

Add geographic grouping by timezone
Include trend analysis (month-over-month engagement changes)
Add customer lifetime value predictions
Implement sophisticated risk scoring

Common Mistakes & Troubleshooting

Null Value Handling Errors

Mistake: Forgetting that comparisons with NULL always return NULL (unknown), not true or false:

-- Wrong: excludes rows where last_login_date is NULL
SELECT * FROM customers WHERE last_login_date != '2023-01-01';

-- Right: explicitly handle NULLs
SELECT * FROM customers 
WHERE last_login_date != '2023-01-01' OR last_login_date IS NULL;

Mistake: Using NOT IN with potential NULL values:

-- Dangerous: returns no rows if any order has NULL customer_id
SELECT * FROM customers 
WHERE customer_id NOT IN (SELECT customer_id FROM orders);

-- Safe: exclude NULLs explicitly
SELECT * FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);

Performance Anti-Patterns

Mistake: Leading wildcards in LIKE operations:

-- Can't use indexes
WHERE product_name LIKE '%phone%'

-- Better: full-text search or specialized indexing
WHERE MATCH(product_name) AGAINST ('phone' IN NATURAL LANGUAGE MODE)

Mistake: Functions in WHERE clauses prevent index usage:

-- Index can't be used
WHERE YEAR(order_date) = 2023

-- Index-friendly alternative
WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01'

Mistake: Inefficient OR conditions:

-- May require multiple index scans
WHERE customer_tier = 'gold' OR lifetime_value > 10000

-- Consider UNION for better performance
SELECT * FROM customers WHERE customer_tier = 'gold'
UNION
SELECT * FROM customers WHERE lifetime_value > 10000;

Data Type and Conversion Issues

Mistake: Implicit type conversions affect performance and correctness:

-- Implicit conversion prevents index usage
WHERE customer_id = '123'  -- customer_id is INTEGER

-- Explicit, correct version
WHERE customer_id = 123

Mistake: Timezone-naive date filtering:

-- Assumes server timezone
WHERE order_date BETWEEN '2023-11-24' AND '2023-11-25'

-- Timezone-aware filtering
WHERE order_date AT TIME ZONE 'UTC' 
      BETWEEN '2023-11-24 00:00:00+00' AND '2023-11-25 23:59:59+00'

Query Optimization Pitfalls

Mistake: Over-filtering in subqueries:

-- Subquery processes all orders unnecessarily
SELECT c.*
FROM customers c
WHERE c.customer_id IN (
    SELECT o.customer_id 
    FROM orders o 
    WHERE o.order_total > 1000
      AND o.order_date > '2023-01-01'
      AND EXISTS (
          SELECT 1 FROM products p 
          WHERE p.product_id = o.product_id 
            AND p.category = 'Electronics'
      )
);

-- Better: move filtering to main query when possible
SELECT DISTINCT c.*
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.order_total > 1000
  AND o.order_date > '2023-01-01'  
  AND p.category = 'Electronics';

Debugging Complex Filtering Logic

When complex WHERE clauses don't return expected results:

Break down conditions: Test each part separately
Check for NULL values: Add explicit NULL handling
Verify data types: Ensure comparisons use compatible types
Examine execution plans: Look for expensive operations
Test with smaller datasets: Isolate logic errors

-- Debugging approach: build query incrementally
-- Start with base query
SELECT COUNT(*) FROM customers;

-- Add each filter condition one by one
SELECT COUNT(*) FROM customers WHERE customer_tier = 'gold';
SELECT COUNT(*) FROM customers WHERE customer_tier = 'gold' AND is_active = true;
SELECT COUNT(*) FROM customers WHERE customer_tier = 'gold' AND is_active = true AND lifetime_value > 1000;

-- Check for unexpected NULLs
SELECT 
    COUNT(*) as total_rows,
    COUNT(customer_tier) as non_null_tier,
    COUNT(CASE WHEN is_active = true THEN 1 END) as active_true,
    COUNT(CASE WHEN is_active = false THEN 1 END) as active_false,
    COUNT(CASE WHEN is_active IS NULL THEN 1 END) as active_null
FROM customers;

Summary & Next Steps

You've mastered enterprise-level data filtering and sorting techniques that separate professional data practitioners from casual SQL users. The key insights from this deep dive:

Filtering Mastery:

Logical operator precedence and short-circuit evaluation affect both performance and correctness
NULL value handling in complex conditions requires explicit consideration
IN, BETWEEN, and pattern matching have subtle behaviors that can break production queries
Query optimization depends on understanding how databases evaluate compound conditions

Sorting Sophistication:

ORDER BY performance depends heavily on index design and memory management
Custom sort orders enable business-specific data presentation
Combining filtering and sorting requires integrated optimization strategies
Database-specific features can dramatically improve performance for complex queries

Production-Ready Practices:

Always consider NULL value implications in filter conditions
Design indexes to match your most common filtering and sorting patterns
Use execution plan analysis to verify optimization assumptions
Test edge cases and boundary conditions thoroughly

Next Steps for Advanced Mastery:

Window Functions: Learn PARTITION BY, ROW_NUMBER(), and RANK() for sophisticated analytical queries
Advanced Joins: Master complex join patterns including lateral joins, recursive CTEs, and set operations
Query Performance Tuning: Dive deeper into execution plans, statistics management, and database-specific optimization features
Data Architecture: Understand how filtering and sorting strategies fit into larger data pipeline and warehouse design

The techniques you've learned here form the foundation for virtually every advanced SQL operation. Whether you're building real-time analytics dashboards, processing massive data pipelines, or creating sophisticated business intelligence reports, these filtering and sorting patterns will be your daily tools.

Practice these concepts with your own datasets, paying particular attention to performance characteristics and edge case handling. The difference between competent and expert SQL practitioners lies not in knowing more functions, but in deeply understanding how these fundamental operations behave under real-world conditions.