A/B Testing Analysis with SQL
A/B Testing Analysis with SQL
A/B testing (also called split testing or controlled experiments) is the gold standard for making data-driven product decisions. Rather than guessing whether a new checkout flow, email subject line, or pricing page performs better, you randomly assign users to variants and measure the difference.
SQL is central to A/B test analysis: it extracts experiment data, computes conversion rates, calculates statistical significance, and identifies whether results hold across customer segments.
The Statistical Foundation
An A/B test measures whether an observed difference between two groups is real or just random chance.
Key concepts:
- Control (A): The existing version (baseline)
- Treatment (B): The new version being tested
- Conversion Rate: The percentage of users who complete the target action
- Statistical Significance: Are the results unlikely to be due to random chance? Typically we want p-value < 0.05 (95% confidence)
- Minimum Detectable Effect (MDE): The smallest improvement that would be meaningful to the business (e.g., 1% lift in conversion)
SQL cannot directly compute p-values for most statistical tests, but it can compute the Z-score for a two-proportion test, which maps directly to statistical significance.
Step 1 — Extract Experiment Groups and Conversions
The first step identifies which users were in each variant and whether they converted. This requires joining the ec_events table (which records the experiment assignment) to the ec_orders table (which records conversions).
-- Template — adapt to your own ec_events schema (this course's
-- ec_events table has no `properties` column)
SELECT 'see template below' AS info;
/*AB_TEMPLATE*/
/*
-- Template — adapt to your own ec_events schema (this course's
-- ec_events table has no `properties` column)
SELECT 'see template below' AS info;
/*AB_TEMPLATE
WITH experiment_users AS (
SELECT
customer_id,
properties->>'variant' AS variant,
MIN(created_at) AS assignment_time
FROM ec_events
WHERE event_type = 'experiment_assigned'
AND properties->>'experiment_id' = 'checkout_redesign_v2'
GROUP BY customer_id, properties->>'variant'
),
conversions AS (
SELECT DISTINCT o.customer_id AS customer_id
FROM ec_orders o
JOIN experiment_users eu ON eu.customer_id = o.customer_id
WHERE o.created_at > eu.assignment_time -- only ec_orders AFTER assignment
AND o.status NOT IN ('cancelled', 'refunded')
)
SELECT
eu.variant,
COUNT(DISTINCT eu.customer_id) AS users,
COUNT(DISTINCT c.customer_id) AS conversions,
ROUND(COUNT(DISTINCT c.customer_id)::numeric / NULLIF(COUNT(DISTINCT eu.customer_id), 0) * 100, 2) AS conversion_rate_pct
FROM experiment_users eu
LEFT JOIN conversions c ON c.customer_id = eu.customer_id
GROUP BY eu.variant
ORDER BY eu.variant;
*/
*/
What This Returns
The fundamental A/B result: users in each variant, their conversion counts, and conversion rates. If variant B shows 4.2% vs variant A's 3.8%, that's a 10.5% relative lift — but is it statistically significant? Step 3 answers that.
Critical detail: WHERE o.created_at > eu.assignment_time ensures we only count ec_orders that happened after the experiment assignment. Including pre-experiment ec_orders would bias results toward users with pre-existing purchase intent.