Advanced Aggregation Patterns — Combining Multiple Techniques

Advanced Aggregation Patterns — Combining Multiple Techniques

The previous lessons covered individual aggregation features. This lesson demonstrates how to combine them — GROUPING SETS with FILTER, statistical aggregates with ROLLUP, ordered-set aggregates with window functions — to build the kind of complex analytical queries that appear in production BI systems.

These patterns are not academic exercises. Every query in this lesson solves a real problem that arises in data engineering, financial analysis, or product analytics.


Pattern 1: Executive KPI dashboard — everything in one query

This query produces a complete monthly executive dashboard in a single pass: revenue at multiple rollup levels, collection metrics, growth rates, and percentile benchmarks all together.

WITH monthly_detail AS (
    SELECT
        DATE_TRUNC('month', i.issue_date)::DATE AS month,
        c.industry,
        SUM(ii.quantity * ii.unit_price)                            AS billed,
        SUM(ii.quantity * ii.unit_price) FILTER (WHERE i.status = 'paid')   AS collected,
        COUNT(DISTINCT i.invoice_id)                                AS invoices,
        COUNT(DISTINCT c.client_id)                                 AS active_clients,
        PERCENTILE_CONT(0.5) WITHIN GROUP (
            ORDER BY ii.quantity * ii.unit_price
        )                                                           AS industry_median_invoice
    FROM clients c
    JOIN invoices i ON i.client_id = c.client_id
    JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
    GROUP BY DATE_TRUNC('month', i.issue_date), c.industry
),
rollup_data AS (
    SELECT
        CASE WHEN GROUPING(month)     = 1 THEN NULL ELSE month    END AS month,
        CASE WHEN GROUPING(industry)  = 1 THEN 'ALL' ELSE industry END AS industry,
        ROUND(SUM(billed), 0)    AS total_billed,
        ROUND(SUM(collected), 0) AS total_collected,
        SUM(invoices)            AS invoice_count,
        SUM(active_clients)      AS client_count,
        ROUND(100.0 * SUM(collected) / NULLIF(SUM(billed), 0), 1) AS collection_rate,
        GROUPING(month, industry) AS level_code
    FROM monthly_detail
    GROUP BY ROLLUP(month, industry)
)
SELECT
    month,
    industry,
    total_billed,
    total_collected,
    collection_rate,
    invoice_count,
    -- Month-over-month growth (only meaningful for detail rows)
    CASE WHEN level_code = 0 THEN
        ROUND(100.0 * (total_billed - LAG(total_billed) OVER (
            PARTITION BY industry ORDER BY month
        )) / NULLIF(LAG(total_billed) OVER (PARTITION BY industry ORDER BY month), 0), 1)
    END AS mom_growth_pct,
    level_code
FROM rollup_data
ORDER BY level_code, month NULLS LAST, industry;

This query uses: CTE for pre-aggregation, window function inside CTE for cross-industry median, ROLLUP for multi-level totals, GROUPING() for level detection, FILTER for conditional aggregation, and LAG() for period-over-period comparison — all composing cleanly.


Pattern 2: Quartile segmentation with rollup

Segment customers into LTV quartiles, then compute revenue and count at each quartile level with industry rollup:

WITH customer_ltv AS (
    SELECT
        c.customer_id,
        c.acquisition_channel,
        SUM(o.total_amount) AS lifetime_value,
        NTILE(4) OVER (ORDER BY SUM(o.total_amount)) AS ltv_quartile
    FROM ec_customers c
    JOIN ec_orders o ON o.customer_id = c.customer_id AND o.status = 'completed'
    GROUP BY c.customer_id, c.acquisition_channel
),
quartile_labels AS (
    SELECT *,
        CASE ltv_quartile
            WHEN 1 THEN 'Q1 (Bottom 25%)'
            WHEN 2 THEN 'Q2 (25–50%)'
            WHEN 3 THEN 'Q3 (50–75%)'
            WHEN 4 THEN 'Q4 (Top 25%)'
        END AS quartile_label
    FROM customer_ltv
)
SELECT
    COALESCE(CASE WHEN GROUPING(acquisition_channel) = 1 THEN 'ALL CHANNELS' ELSE acquisition_channel END,
             'ALL CHANNELS') AS channel,
    COALESCE(CASE WHEN GROUPING(quartile_label) = 1 THEN 'ALL SEGMENTS' ELSE quartile_label END,
             'ALL SEGMENTS') AS segment,
    COUNT(*) AS customers,
    ROUND(SUM(lifetime_value), 0) AS total_ltv,
    ROUND((AVG(lifetime_value))::NUMERIC, 2) AS avg_ltv,
    ROUND(SUM(lifetime_value) * 100.0 / SUM(SUM(lifetime_value)) OVER (), 1) AS pct_of_total_ltv,
    GROUPING(acquisition_channel, quartile_label) AS level
FROM quartile_labels
GROUP BY GROUPING SETS (
    (acquisition_channel, quartile_label),
    (acquisition_channel),
    (quartile_label),
    ()
)
ORDER BY level, channel, quartile_label;

Pattern 3: Time-series anomaly detection with statistical aggregates

Detect months where revenue deviated more than 2 standard deviations from the rolling 6-month mean — automated anomaly detection in production revenue:

WITH monthly_revenue AS (
    SELECT
        DATE_TRUNC('month', DATE_TRUNC('day', o.created_at)::DATE)::DATE AS month,
        ROUND(SUM(o.total_amount), 2) AS revenue
    FROM ec_orders o WHERE o.status = 'completed'
    GROUP BY 1
),
rolling_stats AS (
    SELECT
        month,
        revenue,
        ROUND(AVG(revenue) OVER (
            ORDER BY month ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
        ), 2) AS rolling_6m_mean,
        ROUND(STDDEV_SAMP(revenue) OVER (
            ORDER BY month ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING
        ), 2) AS rolling_6m_stddev
    FROM monthly_revenue
),
with_zscore AS (
    SELECT *,
        ROUND((revenue - rolling_6m_mean) / NULLIF(rolling_6m_stddev, 0), 2) AS z_score
    FROM rolling_stats
    WHERE rolling_6m_stddev IS NOT NULL
)
SELECT
    month,
    revenue,
    rolling_6m_mean,
    rolling_6m_stddev,
    z_score,
    CASE
        WHEN ABS(z_score) > 2 THEN '⚠ ANOMALY'
        WHEN ABS(z_score) > 1 THEN 'Unusual'
        ELSE 'Normal'
    END AS classification,
    ROUND(rolling_6m_mean + 2 * rolling_6m_stddev, 2) AS upper_control_limit,
    ROUND(rolling_6m_mean - 2 * rolling_6m_stddev, 2) AS lower_control_limit
FROM with_zscore
ORDER BY month;

This implements a Statistical Process Control (SPC) chart in SQL — the same methodology used in manufacturing quality control, applied to revenue monitoring. Any month outside the upper or lower control limits triggers investigation.


Pattern 4: Multi-dimensional basket analysis

Which product category combinations appear most frequently in the same order? The basket analysis query uses STRING_AGG with ARRAY_AGG to build category sets, then aggregates:

WITH order_categories AS (
    SELECT
        o.order_id,
        STRING_AGG(DISTINCT p.category, ',' ORDER BY p.category) AS category_combo,
        ARRAY_AGG(DISTINCT p.category ORDER BY p.category) AS category_array,
        COUNT(DISTINCT p.category) AS unique_categories
    FROM ec_orders o
    JOIN ec_order_items oi ON oi.order_id = o.order_id
    JOIN ec_products p ON p.product_id = oi.product_id
    WHERE o.status = 'completed'
    GROUP BY o.order_id
    HAVING COUNT(DISTINCT p.category) >= 2  -- multi-category orders only
)
SELECT
    category_combo,
    COUNT(*) AS order_count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 2) AS pct_of_multi_cat_orders,
    ROUND((AVG(o.total_amount))::NUMERIC, 2) AS avg_order_value
FROM order_categories oc
JOIN ec_orders o ON o.order_id = oc.order_id
GROUP BY category_combo
ORDER BY order_count DESC
LIMIT 20;

Key Takeaway

Advanced SQL is not about mastering one feature — it's about composing multiple features correctly. CTEs provide staging layers. ROLLUP and GROUPING SETS handle multi-level totals. FILTER handles conditional metrics. Statistical aggregates handle distributions. Window functions handle period comparisons and rankings. When all of these compose cleanly in a single query, you get production-grade analytical SQL that would take dozens of lines of application code to replicate.