Advanced Aggregation Patterns — Combining Multiple Techniques
Advanced Aggregation Patterns — Combining Multiple Techniques
The previous lessons covered individual aggregation features. This lesson demonstrates how to combine them — GROUPING SETS with FILTER, statistical aggregates with ROLLUP, ordered-set aggregates with window functions — to build the kind of complex analytical queries that appear in production BI systems.
These patterns are not academic exercises. Every query in this lesson solves a real problem that arises in data engineering, financial analysis, or product analytics.
Pattern 1: Executive KPI dashboard — everything in one query
This query produces a complete monthly executive dashboard in a single pass: revenue at multiple rollup levels, collection metrics, growth rates, and percentile benchmarks all together.
WITH monthly_detail AS (
SELECT
DATE_TRUNC('month', i.issue_date)::DATE AS month,
c.industry,
SUM(ii.quantity * ii.unit_price) AS billed,
SUM(ii.quantity * ii.unit_price) FILTER (WHERE i.status = 'paid') AS collected,
COUNT(DISTINCT i.invoice_id) AS invoices,
COUNT(DISTINCT c.client_id) AS active_clients,
PERCENTILE_CONT(0.5) WITHIN GROUP (
ORDER BY ii.quantity * ii.unit_price
) AS industry_median_invoice
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
GROUP BY DATE_TRUNC('month', i.issue_date), c.industry
),
rollup_data AS (
SELECT
CASE WHEN GROUPING(month) = 1 THEN NULL ELSE month END AS month,
CASE WHEN GROUPING(industry) = 1 THEN 'ALL' ELSE industry END AS industry,
ROUND(SUM(billed), 0) AS total_billed,
ROUND(SUM(collected), 0) AS total_collected,
SUM(invoices) AS invoice_count,
SUM(active_clients) AS client_count,
ROUND(100.0 * SUM(collected) / NULLIF(SUM(billed), 0), 1) AS collection_rate,
GROUPING(month, industry) AS level_code
FROM monthly_detail
GROUP BY ROLLUP(month, industry)
)
SELECT
month,
industry,
total_billed,
total_collected,
collection_rate,
invoice_count,
-- Month-over-month growth (only meaningful for detail rows)
CASE WHEN level_code = 0 THEN
ROUND(100.0 * (total_billed - LAG(total_billed) OVER (
PARTITION BY industry ORDER BY month
)) / NULLIF(LAG(total_billed) OVER (PARTITION BY industry ORDER BY month), 0), 1)
END AS mom_growth_pct,
level_code
FROM rollup_data
ORDER BY level_code, month NULLS LAST, industry;
This query uses: CTE for pre-aggregation, window function inside CTE for cross-industry median, ROLLUP for multi-level totals, GROUPING() for level detection, FILTER for conditional aggregation, and LAG() for period-over-period comparison — all composing cleanly.
Pattern 2: Quartile segmentation with rollup
Segment customers into LTV quartiles, then compute revenue and count at each quartile level with industry rollup:
WITH customer_ltv AS (
SELECT
c.customer_id,
c.acquisition_channel,
SUM(o.total_amount) AS lifetime_value,
NTILE(4) OVER (ORDER BY SUM(o.total_amount)) AS ltv_quartile
FROM ec_customers c
JOIN ec_orders o ON o.customer_id = c.customer_id AND o.status = 'completed'
GROUP BY c.customer_id, c.acquisition_channel
),
quartile_labels AS (
SELECT *,
CASE ltv_quartile
WHEN 1 THEN 'Q1 (Bottom 25%)'
WHEN 2 THEN 'Q2 (25–50%)'
WHEN 3 THEN 'Q3 (50–75%)'
WHEN 4 THEN 'Q4 (Top 25%)'
END AS quartile_label
FROM customer_ltv
)
SELECT
COALESCE(CASE WHEN GROUPING(acquisition_channel) = 1 THEN 'ALL CHANNELS' ELSE acquisition_channel END,
'ALL CHANNELS') AS channel,
COALESCE(CASE WHEN GROUPING(quartile_label) = 1 THEN 'ALL SEGMENTS' ELSE quartile_label END,
'ALL SEGMENTS') AS segment,
COUNT(*) AS customers,
ROUND(SUM(lifetime_value), 0) AS total_ltv,
ROUND((AVG(lifetime_value))::NUMERIC, 2) AS avg_ltv,
ROUND(SUM(lifetime_value) * 100.0 / SUM(SUM(lifetime_value)) OVER (), 1) AS pct_of_total_ltv,
GROUPING(acquisition_channel, quartile_label) AS level
FROM quartile_labels
GROUP BY GROUPING SETS (
(acquisition_channel, quartile_label),
(acquisition_channel),
(quartile_label),
()
)
ORDER BY level, channel, quartile_label;
Pattern 3: Time-series anomaly detection with statistical aggregates
Detect months where revenue deviated more than 2 standard deviations from the rolling 6-month mean — automated anomaly detection in production revenue:
WITH monthly_revenue AS (
SELECT
DATE_TRUNC('month', DATE_TRUNC('day', o.created_at)::DATE)::DATE AS month,
ROUND(SUM(o.total_amount), 2) AS revenue
FROM ec_orders o WHERE o.status = 'completed'
GROUP BY 1
),
rolling_stats AS (
SELECT
month,
revenue,
ROUND(AVG(revenue) OVER (
ORDER BY month ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
), 2) AS rolling_6m_mean,
ROUND(STDDEV_SAMP(revenue) OVER (
ORDER BY month ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
), 2) AS rolling_6m_stddev
FROM monthly_revenue
),
with_zscore AS (
SELECT *,
ROUND((revenue - rolling_6m_mean) / NULLIF(rolling_6m_stddev, 0), 2) AS z_score
FROM rolling_stats
WHERE rolling_6m_stddev IS NOT NULL
)
SELECT
month,
revenue,
rolling_6m_mean,
rolling_6m_stddev,
z_score,
CASE
WHEN ABS(z_score) > 2 THEN '⚠ ANOMALY'
WHEN ABS(z_score) > 1 THEN 'Unusual'
ELSE 'Normal'
END AS classification,
ROUND(rolling_6m_mean + 2 * rolling_6m_stddev, 2) AS upper_control_limit,
ROUND(rolling_6m_mean - 2 * rolling_6m_stddev, 2) AS lower_control_limit
FROM with_zscore
ORDER BY month;
This implements a Statistical Process Control (SPC) chart in SQL — the same methodology used in manufacturing quality control, applied to revenue monitoring. Any month outside the upper or lower control limits triggers investigation.
Pattern 4: Multi-dimensional basket analysis
Which product category combinations appear most frequently in the same order? The basket analysis query uses STRING_AGG with ARRAY_AGG to build category sets, then aggregates:
WITH order_categories AS (
SELECT
o.order_id,
STRING_AGG(DISTINCT p.category, ',' ORDER BY p.category) AS category_combo,
ARRAY_AGG(DISTINCT p.category ORDER BY p.category) AS category_array,
COUNT(DISTINCT p.category) AS unique_categories
FROM ec_orders o
JOIN ec_order_items oi ON oi.order_id = o.order_id
JOIN ec_products p ON p.product_id = oi.product_id
WHERE o.status = 'completed'
GROUP BY o.order_id
HAVING COUNT(DISTINCT p.category) >= 2 -- multi-category orders only
)
SELECT
category_combo,
COUNT(*) AS order_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_of_multi_cat_orders,
ROUND((AVG(o.total_amount))::NUMERIC, 2) AS avg_order_value
FROM order_categories oc
JOIN ec_orders o ON o.order_id = oc.order_id
GROUP BY category_combo
ORDER BY order_count DESC
LIMIT 20;
Key Takeaway
Advanced SQL is not about mastering one feature — it's about composing multiple features correctly. CTEs provide staging layers. ROLLUP and GROUPING SETS handle multi-level totals. FILTER handles conditional metrics. Statistical aggregates handle distributions. Window functions handle period comparisons and rankings. When all of these compose cleanly in a single query, you get production-grade analytical SQL that would take dozens of lines of application code to replicate.