Retention Curve and Churn Prediction
Retention Curve and Churn Prediction
A retention curve shows what percentage of users remain active N days after their first purchase. Churn prediction identifies which ec_customers are showing early warning signs of leaving — before they actually go. Together, these analyses form the foundation of customer success and retention strategy.
Retention is often called the "silent metric" because declining retention can be hidden for months by new customer acquisition. A business can look like it's growing (more new ec_customers) while actually losing ground (existing ec_customers churning faster). The retention curve makes this visible.
Why Retention Is the Most Important Metric
A 5% improvement in retention can increase profits by 25-95% (Bain & Company research), because:
- Existing ec_customers cost less to serve (no acquisition cost)
- Existing ec_customers buy more over time (LTV increases with tenure)
- Retained ec_customers become word-of-mouth advocates
The retention curve reveals your natural churn rate and lets you identify whether retention is improving or declining over time.
Dataset
-- ec_orders: customer_id, created_at, total_amount
-- ec_events: customer_id, event_type, created_at
Step 1 — Classic Retention Curve (Day 1, 7, 30, 90)
The simplest retention curve asks: "Of everyone who made their first purchase, what percentage came back within 7 days? 30 days? 90 days?"
This uses MIN(created_at) to find each customer's first purchase (their "acquisition date"), then LEFT JOIN to find repeat ec_orders after that date. The key insight is WHERE o.created_at > fo.first_purchase — we only count purchases after the first one.
COUNT(DISTINCT CASE WHEN days_since_first <= N THEN ro.customer_id END) counts how many ec_customers from the cohort made at least one additional purchase within N days. Dividing by COUNT(DISTINCT fo.customer_id) gives the retention rate.
WITH first_order AS (
SELECT
customer_id,
MIN(created_at) AS first_purchase
FROM ec_orders
GROUP BY customer_id
),
repeat_orders AS (
SELECT
o.customer_id,
fo.first_purchase,
o.created_at AS order_date,
(o.created_at::date - fo.first_purchase::date) AS days_since_first
FROM ec_orders o
JOIN first_order fo ON fo.customer_id = o.customer_id
WHERE o.created_at > fo.first_purchase -- only purchases AFTER the first
)
SELECT
COUNT(DISTINCT fo.customer_id) AS cohort_size,
COUNT(DISTINCT CASE WHEN days_since_first <= 7 THEN ro.customer_id END) AS retained_7d,
COUNT(DISTINCT CASE WHEN days_since_first <= 30 THEN ro.customer_id END) AS retained_30d,
COUNT(DISTINCT CASE WHEN days_since_first <= 90 THEN ro.customer_id END) AS retained_90d,
ROUND(COUNT(DISTINCT CASE WHEN days_since_first <= 7 THEN ro.customer_id END)::numeric
/ COUNT(DISTINCT fo.customer_id) * 100, 1) AS day7_retention_pct,
ROUND(COUNT(DISTINCT CASE WHEN days_since_first <= 30 THEN ro.customer_id END)::numeric
/ COUNT(DISTINCT fo.customer_id) * 100, 1) AS day30_retention_pct,
ROUND(COUNT(DISTINCT CASE WHEN days_since_first <= 90 THEN ro.customer_id END)::numeric
/ COUNT(DISTINCT fo.customer_id) * 100, 1) AS day90_retention_pct
FROM first_order fo
LEFT JOIN repeat_orders ro ON ro.customer_id = fo.customer_id;
What This Returns
A single-row retention summary for your entire customer base. Benchmark rates vary by industry:
| Metric | Poor | Good | Excellent |
|---|---|---|---|
| Day 7 | <5% | 10-20% | >25% |
| Day 30 | <10% | 20-35% | >40% |
| Day 90 | <15% | 30-50% | >60% |
If day7_retention_pct is below 5%, ec_customers aren't returning for a second purchase within a week — this suggests the product or onboarding experience needs improvement. If day90_retention_pct is high but day7 is low, ec_customers are loyal but on a long repurchase cycle (typical for furniture, electronics, etc.).