You're reviewing your company's user metrics dashboard when your product manager walks over with a familiar look of confusion. "Our monthly active users are up 15%, but revenue per user is trending down. Are we attracting the wrong users, or are our existing users becoming less engaged?" This is exactly the kind of question that basic aggregation queries can't answer—you need to understand user behavior over time, track how different groups of users behave, and identify where users are dropping off in your product experience.
Cohort analysis, funnel analysis, and retention metrics are the powerhouse techniques that transform raw user event data into actionable insights about customer behavior. Instead of looking at snapshots of user activity, these approaches let you track how user behavior evolves, compare different user segments, and pinpoint exactly where your product is succeeding or failing.
By the end of this lesson, you'll be writing sophisticated SQL queries that segment users into meaningful cohorts, calculate retention rates that actually drive product decisions, and build conversion funnels that identify optimization opportunities. These aren't just theoretical concepts—they're the analytical tools that drive growth at data-driven companies.
What you'll learn:
You should be comfortable with intermediate SQL concepts including JOINs, subqueries, window functions (ROW_NUMBER(), LAG(), LEAD()), and date/time manipulation. We'll build on these foundations to create more sophisticated analytical queries.
Before diving into analysis techniques, let's establish the typical data structure you'll encounter. Most cohort and funnel analyses work with event-based data where each row represents a user action:
-- Typical user events table structure
CREATE TABLE user_events (
user_id INTEGER,
event_name VARCHAR(50),
event_timestamp TIMESTAMP,
session_id VARCHAR(100),
properties JSON
);
-- Sample data that mirrors real SaaS product usage
INSERT INTO user_events VALUES
(101, 'signup', '2024-01-15 10:30:00', 'sess_001', '{"source": "google", "plan": "free"}'),
(101, 'first_login', '2024-01-15 10:35:00', 'sess_001', '{"device": "desktop"}'),
(101, 'feature_used', '2024-01-15 11:00:00', 'sess_001', '{"feature": "dashboard"}'),
(101, 'feature_used', '2024-01-20 14:20:00', 'sess_002', '{"feature": "reports"}'),
(101, 'upgrade', '2024-02-10 09:15:00', 'sess_003', '{"from_plan": "free", "to_plan": "pro"}'),
(102, 'signup', '2024-01-16 15:45:00', 'sess_004', '{"source": "direct", "plan": "free"}'),
(102, 'first_login', '2024-01-18 08:20:00', 'sess_005', '{"device": "mobile"}'),
(103, 'signup', '2024-02-01 12:00:00', 'sess_006', '{"source": "facebook", "plan": "trial"}');
This structure captures the essential elements: who did what, when they did it, and contextual information about the action. Real production tables often include additional dimensions like geographic location, device type, or A/B test variants.
Cohort analysis groups users by a shared characteristic (usually their first interaction with your product) and tracks their behavior over time. This reveals patterns that aggregate metrics hide—like whether users who signed up in January behave differently than those who joined in March.
Let's start with a classic monthly retention cohort:
-- Step 1: Define user cohorts by signup month
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(event_timestamp)) AS cohort_month
FROM user_events
WHERE event_name = 'signup'
GROUP BY user_id
),
-- Step 2: Find all user activity periods
user_periods AS (
SELECT DISTINCT
user_id,
DATE_TRUNC('month', event_timestamp) AS activity_month
FROM user_events
WHERE event_name IN ('login', 'feature_used', 'first_login')
),
-- Step 3: Combine cohort assignments with activity periods
cohort_data AS (
SELECT
c.cohort_month,
p.activity_month,
c.user_id,
EXTRACT(EPOCH FROM (p.activity_month - c.cohort_month)) / 2628000 AS month_number
FROM user_cohorts c
LEFT JOIN user_periods p ON c.user_id = p.user_id
)
-- Step 4: Calculate cohort sizes and retention rates
SELECT
cohort_month,
month_number,
COUNT(DISTINCT user_id) AS users_active,
FIRST_VALUE(COUNT(DISTINCT user_id))
OVER (PARTITION BY cohort_month ORDER BY month_number) AS cohort_size,
ROUND(
100.0 * COUNT(DISTINCT user_id) /
FIRST_VALUE(COUNT(DISTINCT user_id))
OVER (PARTITION BY cohort_month ORDER BY month_number),
2
) AS retention_rate
FROM cohort_data
WHERE activity_month IS NOT NULL
GROUP BY cohort_month, month_number
ORDER BY cohort_month, month_number;
This query creates a cohort table showing how users from each signup month behave over time. The month_number indicates periods after signup (0 = signup month, 1 = first month after signup, etc.).
Performance tip: For large datasets, consider creating indexes on
user_idandevent_timestamp. The multiple GROUP BY operations can be expensive, so test with realistic data volumes.
Let's enhance this with more sophisticated cohort analysis that considers user value:
-- Revenue cohort analysis - tracking how different cohorts monetize
WITH user_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', MIN(event_timestamp)) AS cohort_month,
JSON_EXTRACT_PATH_TEXT(properties, 'source') AS acquisition_source
FROM user_events
WHERE event_name = 'signup'
GROUP BY user_id, JSON_EXTRACT_PATH_TEXT(properties, 'source')
),
monthly_revenue AS (
SELECT
user_id,
DATE_TRUNC('month', event_timestamp) AS revenue_month,
CASE
WHEN JSON_EXTRACT_PATH_TEXT(properties, 'to_plan') = 'pro' THEN 99
WHEN JSON_EXTRACT_PATH_TEXT(properties, 'to_plan') = 'enterprise' THEN 299
ELSE 0
END AS revenue
FROM user_events
WHERE event_name = 'upgrade'
),
cohort_revenue AS (
SELECT
c.cohort_month,
c.acquisition_source,
r.revenue_month,
EXTRACT(EPOCH FROM (r.revenue_month - c.cohort_month)) / 2628000 AS month_number,
SUM(r.revenue) AS month_revenue,
COUNT(DISTINCT c.user_id) AS cohort_size
FROM user_cohorts c
LEFT JOIN monthly_revenue r ON c.user_id = r.user_id
GROUP BY c.cohort_month, c.acquisition_source, r.revenue_month
)
SELECT
cohort_month,
acquisition_source,
month_number,
cohort_size,
month_revenue,
SUM(month_revenue) OVER (
PARTITION BY cohort_month, acquisition_source
ORDER BY month_number
ROWS UNBOUNDED PRECEDING
) AS cumulative_revenue,
ROUND(month_revenue / cohort_size::DECIMAL, 2) AS revenue_per_user
FROM cohort_revenue
WHERE month_number IS NOT NULL
ORDER BY cohort_month, acquisition_source, month_number;
This enhanced version segments cohorts by acquisition source and tracks revenue generation, giving you insight into which marketing channels produce the most valuable users over time.
Basic retention rates only tell part of the story. Advanced retention analysis considers different types of user engagement and identifies patterns that drive long-term value.
Here's a comprehensive retention analysis that goes beyond simple "did they come back":
-- Multi-dimensional retention: engagement levels and feature adoption
WITH user_signup AS (
SELECT
user_id,
DATE_TRUNC('week', MIN(event_timestamp)) AS cohort_week
FROM user_events
WHERE event_name = 'signup'
GROUP BY user_id
),
weekly_engagement AS (
SELECT
user_id,
DATE_TRUNC('week', event_timestamp) AS activity_week,
COUNT(DISTINCT event_name) AS unique_events,
COUNT(*) AS total_events,
COUNT(DISTINCT session_id) AS sessions,
CASE
WHEN COUNT(*) >= 20 THEN 'high'
WHEN COUNT(*) >= 5 THEN 'medium'
ELSE 'low'
END AS engagement_level
FROM user_events
WHERE event_name NOT IN ('signup')
GROUP BY user_id, DATE_TRUNC('week', event_timestamp)
),
retention_data AS (
SELECT
s.cohort_week,
e.activity_week,
s.user_id,
FLOOR((e.activity_week - s.cohort_week) / INTERVAL '7 days') AS week_number,
e.engagement_level,
e.total_events,
e.sessions
FROM user_signup s
LEFT JOIN weekly_engagement e ON s.user_id = e.user_id
),
retention_summary AS (
SELECT
cohort_week,
week_number,
COUNT(DISTINCT user_id) AS total_users,
COUNT(DISTINCT CASE WHEN engagement_level = 'high' THEN user_id END) AS high_engagement_users,
COUNT(DISTINCT CASE WHEN engagement_level = 'medium' THEN user_id END) AS medium_engagement_users,
COUNT(DISTINCT CASE WHEN engagement_level = 'low' THEN user_id END) AS low_engagement_users,
AVG(total_events) AS avg_events_per_user,
AVG(sessions) AS avg_sessions_per_user
FROM retention_data
WHERE activity_week IS NOT NULL
GROUP BY cohort_week, week_number
)
SELECT
cohort_week,
week_number,
total_users,
FIRST_VALUE(total_users) OVER (
PARTITION BY cohort_week ORDER BY week_number
) AS cohort_size,
ROUND(100.0 * total_users / FIRST_VALUE(total_users) OVER (
PARTITION BY cohort_week ORDER BY week_number
), 2) AS overall_retention_rate,
ROUND(100.0 * high_engagement_users / total_users, 2) AS high_engagement_rate,
ROUND(avg_events_per_user, 2) AS avg_events,
ROUND(avg_sessions_per_user, 2) AS avg_sessions
FROM retention_summary
ORDER BY cohort_week, week_number;
This analysis reveals not just who comes back, but how engaged they are when they return. You might discover that while overall retention drops, the users who do return are becoming more engaged—a pattern that suggests successful user onboarding.
Conversion funnels track users through sequential steps to identify where they drop off. Unlike retention analysis that looks at behavior over time, funnels focus on specific user journeys.
Let's build a comprehensive signup-to-paid-user funnel:
-- Complete user journey funnel with time-to-convert tracking
WITH funnel_events AS (
SELECT
user_id,
event_name,
event_timestamp,
session_id,
ROW_NUMBER() OVER (PARTITION BY user_id, event_name ORDER BY event_timestamp) AS event_rank
FROM user_events
WHERE event_name IN ('signup', 'first_login', 'feature_used', 'upgrade')
),
-- Get first occurrence of each funnel step per user
user_funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'signup' THEN event_timestamp END) AS signup_time,
MAX(CASE WHEN event_name = 'first_login' THEN event_timestamp END) AS first_login_time,
MAX(CASE WHEN event_name = 'feature_used' THEN event_timestamp END) AS first_feature_time,
MAX(CASE WHEN event_name = 'upgrade' THEN event_timestamp END) AS upgrade_time
FROM funnel_events
WHERE event_rank = 1 -- Only first occurrence of each event
GROUP BY user_id
),
-- Calculate funnel progression with timing
funnel_analysis AS (
SELECT
user_id,
signup_time,
CASE WHEN first_login_time IS NOT NULL THEN 1 ELSE 0 END AS completed_login,
CASE WHEN first_feature_time IS NOT NULL THEN 1 ELSE 0 END AS completed_feature_use,
CASE WHEN upgrade_time IS NOT NULL THEN 1 ELSE 0 END AS completed_upgrade,
-- Time to complete each step (in hours)
CASE WHEN first_login_time IS NOT NULL
THEN EXTRACT(EPOCH FROM (first_login_time - signup_time)) / 3600
END AS hours_to_login,
CASE WHEN first_feature_time IS NOT NULL
THEN EXTRACT(EPOCH FROM (first_feature_time - signup_time)) / 3600
END AS hours_to_feature,
CASE WHEN upgrade_time IS NOT NULL
THEN EXTRACT(EPOCH FROM (upgrade_time - signup_time)) / 3600
END AS hours_to_upgrade
FROM user_funnel_steps
WHERE signup_time IS NOT NULL
)
-- Funnel conversion rates and timing analysis
SELECT
'Overall Funnel' AS segment,
COUNT(*) AS total_signups,
SUM(completed_login) AS completed_login,
SUM(completed_feature_use) AS completed_feature_use,
SUM(completed_upgrade) AS completed_upgrade,
-- Conversion rates
ROUND(100.0 * SUM(completed_login) / COUNT(*), 2) AS login_conversion_rate,
ROUND(100.0 * SUM(completed_feature_use) / COUNT(*), 2) AS feature_conversion_rate,
ROUND(100.0 * SUM(completed_upgrade) / COUNT(*), 2) AS upgrade_conversion_rate,
-- Step-to-step conversion rates
ROUND(100.0 * SUM(completed_feature_use) / NULLIF(SUM(completed_login), 0), 2) AS login_to_feature_rate,
ROUND(100.0 * SUM(completed_upgrade) / NULLIF(SUM(completed_feature_use), 0), 2) AS feature_to_upgrade_rate,
-- Timing analysis
ROUND(AVG(hours_to_login), 2) AS avg_hours_to_login,
ROUND(AVG(hours_to_feature), 2) AS avg_hours_to_feature,
ROUND(AVG(hours_to_upgrade), 2) AS avg_hours_to_upgrade,
-- Time distribution percentiles
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY hours_to_upgrade), 2) AS median_hours_to_upgrade,
ROUND(PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY hours_to_upgrade), 2) AS p90_hours_to_upgrade
FROM funnel_analysis;
This comprehensive funnel analysis provides both conversion rates and timing insights. You can see not just where users drop off, but how quickly successful users move through each step.
Key insight: Pay special attention to step-to-step conversion rates. A 90% signup-to-login rate might look good, but if only 20% of users who log in actually use a feature, that's where you should focus optimization efforts.
Let's extend this to analyze funnels by user segment:
-- Segmented funnel analysis - comparing different user groups
WITH user_segments AS (
SELECT
user_id,
JSON_EXTRACT_PATH_TEXT(properties, 'source') AS acquisition_source,
JSON_EXTRACT_PATH_TEXT(properties, 'plan') AS initial_plan,
event_timestamp AS signup_time
FROM user_events
WHERE event_name = 'signup'
),
funnel_events AS (
SELECT
user_id,
event_name,
event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id, event_name ORDER BY event_timestamp) AS event_rank
FROM user_events
WHERE event_name IN ('signup', 'first_login', 'feature_used', 'upgrade')
),
user_funnel_steps AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'first_login' THEN event_timestamp END) AS first_login_time,
MAX(CASE WHEN event_name = 'feature_used' THEN event_timestamp END) AS first_feature_time,
MAX(CASE WHEN event_name = 'upgrade' THEN event_timestamp END) AS upgrade_time
FROM funnel_events
WHERE event_rank = 1
GROUP BY user_id
),
segmented_funnel AS (
SELECT
s.acquisition_source,
s.initial_plan,
COUNT(*) AS total_signups,
SUM(CASE WHEN f.first_login_time IS NOT NULL THEN 1 ELSE 0 END) AS completed_login,
SUM(CASE WHEN f.first_feature_time IS NOT NULL THEN 1 ELSE 0 END) AS completed_feature_use,
SUM(CASE WHEN f.upgrade_time IS NOT NULL THEN 1 ELSE 0 END) AS completed_upgrade
FROM user_segments s
LEFT JOIN user_funnel_steps f ON s.user_id = f.user_id
GROUP BY s.acquisition_source, s.initial_plan
)
SELECT
acquisition_source,
initial_plan,
total_signups,
ROUND(100.0 * completed_login / total_signups, 2) AS login_conversion_rate,
ROUND(100.0 * completed_feature_use / total_signups, 2) AS feature_conversion_rate,
ROUND(100.0 * completed_upgrade / total_signups, 2) AS upgrade_conversion_rate,
ROUND(100.0 * completed_upgrade / NULLIF(completed_feature_use, 0), 2) AS feature_to_upgrade_rate
FROM segmented_funnel
WHERE total_signups >= 10 -- Only include segments with meaningful sample sizes
ORDER BY upgrade_conversion_rate DESC;
This segmented analysis reveals which acquisition channels and initial plans produce users with the highest conversion rates through your funnel.
The real power emerges when you combine cohort and funnel analysis to understand how different user groups convert over time. This hybrid approach answers questions like "Do users who sign up in December convert better than those who sign up in June?"
-- Time-based cohort funnel analysis
WITH signup_cohorts AS (
SELECT
user_id,
DATE_TRUNC('month', event_timestamp) AS signup_month,
JSON_EXTRACT_PATH_TEXT(properties, 'source') AS acquisition_source,
event_timestamp AS signup_time
FROM user_events
WHERE event_name = 'signup'
),
conversion_events AS (
SELECT
user_id,
event_name,
event_timestamp,
ROW_NUMBER() OVER (PARTITION BY user_id, event_name ORDER BY event_timestamp) AS event_rank
FROM user_events
WHERE event_name IN ('first_login', 'feature_used', 'upgrade')
),
user_conversions AS (
SELECT
user_id,
MAX(CASE WHEN event_name = 'first_login' AND event_rank = 1 THEN event_timestamp END) AS first_login_time,
MAX(CASE WHEN event_name = 'feature_used' AND event_rank = 1 THEN event_timestamp END) AS first_feature_time,
MAX(CASE WHEN event_name = 'upgrade' AND event_rank = 1 THEN event_timestamp END) AS first_upgrade_time
FROM conversion_events
GROUP BY user_id
),
cohort_conversion_analysis AS (
SELECT
c.signup_month,
c.acquisition_source,
COUNT(*) AS cohort_size,
-- Basic conversion counts
SUM(CASE WHEN u.first_login_time IS NOT NULL THEN 1 ELSE 0 END) AS login_converts,
SUM(CASE WHEN u.first_feature_time IS NOT NULL THEN 1 ELSE 0 END) AS feature_converts,
SUM(CASE WHEN u.first_upgrade_time IS NOT NULL THEN 1 ELSE 0 END) AS upgrade_converts,
-- Time-based conversions (within 7 days, 30 days)
SUM(CASE WHEN u.first_upgrade_time <= c.signup_time + INTERVAL '7 days' THEN 1 ELSE 0 END) AS upgrade_7d,
SUM(CASE WHEN u.first_upgrade_time <= c.signup_time + INTERVAL '30 days' THEN 1 ELSE 0 END) AS upgrade_30d,
-- Average time to convert (in days)
AVG(CASE WHEN u.first_upgrade_time IS NOT NULL
THEN EXTRACT(EPOCH FROM (u.first_upgrade_time - c.signup_time)) / 86400
END) AS avg_days_to_upgrade
FROM signup_cohorts c
LEFT JOIN user_conversions u ON c.user_id = u.user_id
GROUP BY c.signup_month, c.acquisition_source
)
SELECT
signup_month,
acquisition_source,
cohort_size,
-- Overall conversion rates
ROUND(100.0 * upgrade_converts / cohort_size, 2) AS overall_upgrade_rate,
ROUND(100.0 * upgrade_7d / cohort_size, 2) AS upgrade_rate_7d,
ROUND(100.0 * upgrade_30d / cohort_size, 2) AS upgrade_rate_30d,
-- Funnel efficiency
ROUND(100.0 * upgrade_converts / NULLIF(feature_converts, 0), 2) AS feature_to_upgrade_efficiency,
ROUND(avg_days_to_upgrade, 1) AS avg_days_to_upgrade,
-- Cohort performance vs overall average
ROUND(
100.0 * (upgrade_converts::DECIMAL / cohort_size) /
AVG(upgrade_converts::DECIMAL / cohort_size) OVER (),
2
) AS performance_vs_average
FROM cohort_conversion_analysis
WHERE cohort_size >= 20 -- Focus on statistically meaningful cohorts
ORDER BY signup_month, upgrade_rate_30d DESC;
This combined analysis shows you which months and acquisition sources produce the most valuable users, and how conversion patterns change over time.
Seasonal insight: You might discover that users who sign up in Q4 convert at higher rates but take longer to upgrade, while Q2 signups convert quickly but at lower rates. This suggests different onboarding strategies for different time periods.
Let's build a comprehensive retention and funnel dashboard for a SaaS product. You'll create a query that product managers can use to understand user behavior across multiple dimensions.
-- Comprehensive user behavior analysis dashboard
WITH base_events AS (
-- Normalize your event data structure
SELECT
user_id,
event_name,
event_timestamp,
DATE_TRUNC('week', event_timestamp) AS event_week,
JSON_EXTRACT_PATH_TEXT(properties, 'source') AS source,
JSON_EXTRACT_PATH_TEXT(properties, 'plan') AS plan,
session_id
FROM user_events
),
user_cohorts AS (
-- Define user cohorts by signup week and characteristics
SELECT
user_id,
MIN(event_timestamp) AS signup_time,
DATE_TRUNC('week', MIN(event_timestamp)) AS cohort_week,
FIRST_VALUE(source) OVER (PARTITION BY user_id ORDER BY event_timestamp) AS acquisition_source,
FIRST_VALUE(plan) OVER (PARTITION BY user_id ORDER BY event_timestamp) AS initial_plan
FROM base_events
WHERE event_name = 'signup'
GROUP BY user_id, source, plan
),
weekly_activity AS (
-- Track user activity by week
SELECT
user_id,
event_week,
COUNT(DISTINCT event_name) AS unique_events,
COUNT(*) AS total_events,
COUNT(DISTINCT session_id) AS sessions,
MAX(CASE WHEN event_name = 'upgrade' THEN 1 ELSE 0 END) AS upgraded_this_week
FROM base_events
WHERE event_name != 'signup'
GROUP BY user_id, event_week
),
-- Your task: Complete this analysis by adding:
-- 1. Retention calculation by cohort and acquisition source
-- 2. Funnel analysis from signup to first value event to upgrade
-- 3. Identify power users (top 10% by activity) and their characteristics
-- 4. Calculate weekly cohort performance metrics
retention_analysis AS (
SELECT
c.cohort_week,
c.acquisition_source,
a.event_week,
FLOOR((a.event_week - c.cohort_week) / INTERVAL '7 days') AS week_number,
COUNT(DISTINCT c.user_id) AS cohort_size,
COUNT(DISTINCT a.user_id) AS active_users,
SUM(a.upgraded_this_week) AS upgrades_this_week
FROM user_cohorts c
LEFT JOIN weekly_activity a ON c.user_id = a.user_id
GROUP BY c.cohort_week, c.acquisition_source, a.event_week
),
-- Complete the final summary query
final_dashboard AS (
-- Your implementation here
-- Should include retention rates, funnel conversion rates, and cohort comparisons
)
SELECT * FROM final_dashboard;
Your challenge: Complete this query to create a dashboard that shows:
Mistake #1: Ignoring Data Quality Issues
The most common error in cohort analysis is assuming your event data is clean. Always validate your assumptions:
-- Data quality check for cohort analysis
SELECT
'Duplicate signup events' AS issue,
COUNT(*) AS affected_users
FROM (
SELECT user_id, COUNT(*) as signup_count
FROM user_events
WHERE event_name = 'signup'
GROUP BY user_id
HAVING COUNT(*) > 1
) duplicates
UNION ALL
SELECT
'Users with events before signup' AS issue,
COUNT(*) AS affected_users
FROM (
SELECT DISTINCT e1.user_id
FROM user_events e1
JOIN user_events e2 ON e1.user_id = e2.user_id
WHERE e1.event_name != 'signup'
AND e2.event_name = 'signup'
AND e1.event_timestamp < e2.event_timestamp
) pre_signup_activity;
Mistake #2: Incorrect Time Window Calculations
Date arithmetic in SQL can be tricky, especially across month boundaries:
-- Wrong: This doesn't account for varying month lengths
DATEDIFF(month, signup_date, activity_date) AS month_number
-- Correct: Use proper interval arithmetic
EXTRACT(EPOCH FROM (activity_date - signup_date)) / 2628000 AS month_number
Mistake #3: Survivorship Bias in Retention Analysis
Only analyzing users who are still active introduces bias. Always include all users in your cohort:
-- Wrong: Only shows retained users
SELECT cohort_month, AVG(activity_days)
FROM user_activity
WHERE last_activity >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY cohort_month;
-- Correct: Shows all users, with NULL for inactive ones
SELECT
c.cohort_month,
COUNT(*) AS total_users,
COUNT(a.user_id) AS active_users,
AVG(a.activity_days) AS avg_activity_days
FROM cohorts c
LEFT JOIN user_activity a ON c.user_id = a.user_id
AND a.last_activity >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY c.cohort_month;
Performance Troubleshooting:
For large datasets, these queries can become slow. Key optimization strategies:
(user_id, event_timestamp, event_name)-- Index creation for better performance
CREATE INDEX idx_events_user_time_event
ON user_events (user_id, event_timestamp, event_name);
CREATE INDEX idx_events_event_time
ON user_events (event_name, event_timestamp);
You now have the SQL skills to build sophisticated user behavior analyses that drive product decisions. Cohort analysis reveals how different user segments behave over time, funnel analysis identifies conversion bottlenecks, and retention metrics track long-term user value.
The key insights from this lesson:
Next steps for your development:
Advanced topics to explore:
The analytical foundation you've built here scales to enterprise-level user research. Companies like Spotify, Netflix, and Airbnb use these exact techniques (with more sophisticated infrastructure) to understand and optimize user behavior at massive scale.
Learning Path: Advanced SQL Queries