A/B Testing Analysis with SQL

A/B Testing Analysis with SQL

A/B testing (also called split testing or controlled experiments) is the gold standard for making data-driven product decisions. Rather than guessing whether a new checkout flow, email subject line, or pricing page performs better, you randomly assign users to variants and measure the difference.

SQL is central to A/B test analysis: it extracts experiment data, computes conversion rates, calculates statistical significance, and identifies whether results hold across customer segments.

The Statistical Foundation

An A/B test measures whether an observed difference between two groups is real or just random chance.

Key concepts:

  • Control (A): The existing version (baseline)
  • Treatment (B): The new version being tested
  • Conversion Rate: The percentage of users who complete the target action
  • Statistical Significance: Are the results unlikely to be due to random chance? Typically we want p-value < 0.05 (95% confidence)
  • Minimum Detectable Effect (MDE): The smallest improvement that would be meaningful to the business (e.g., 1% lift in conversion)

SQL cannot directly compute p-values for most statistical tests, but it can compute the Z-score for a two-proportion test, which maps directly to statistical significance.

Step 1 — Extract Experiment Groups and Conversions

The first step identifies which users were in each variant and whether they converted. This requires joining the ec_events table (which records the experiment assignment) to the ec_orders table (which records conversions).

-- Template — adapt to your own ec_events schema (this course's
-- ec_events table has no `properties` column)
SELECT 'see template below' AS info;
/*AB_TEMPLATE*/
/*
-- Template — adapt to your own ec_events schema (this course's
-- ec_events table has no `properties` column)
SELECT 'see template below' AS info;
/*AB_TEMPLATE
WITH experiment_users AS (
    SELECT
        customer_id,
        properties->>'variant' AS variant,
        MIN(created_at) AS assignment_time
    FROM ec_events
    WHERE event_type = 'experiment_assigned'
      AND properties->>'experiment_id' = 'checkout_redesign_v2'
    GROUP BY customer_id, properties->>'variant'
),
conversions AS (
    SELECT DISTINCT o.customer_id AS customer_id
    FROM ec_orders o
    JOIN experiment_users eu ON eu.customer_id = o.customer_id
    WHERE o.created_at > eu.assignment_time  -- only ec_orders AFTER assignment
      AND o.status NOT IN ('cancelled', 'refunded')
)
SELECT
    eu.variant,
    COUNT(DISTINCT eu.customer_id) AS users,
    COUNT(DISTINCT c.customer_id) AS conversions,
    ROUND(COUNT(DISTINCT c.customer_id)::numeric / NULLIF(COUNT(DISTINCT eu.customer_id), 0) * 100, 2) AS conversion_rate_pct
FROM experiment_users eu
LEFT JOIN conversions c ON c.customer_id = eu.customer_id
GROUP BY eu.variant
ORDER BY eu.variant;
*/
*/

What This Returns

The fundamental A/B result: users in each variant, their conversion counts, and conversion rates. If variant B shows 4.2% vs variant A's 3.8%, that's a 10.5% relative lift — but is it statistically significant? Step 3 answers that.

Critical detail: WHERE o.created_at > eu.assignment_time ensures we only count ec_orders that happened after the experiment assignment. Including pre-experiment ec_orders would bias results toward users with pre-existing purchase intent.

Purchase this course to unlock the full lesson.

Sign up