Advanced HAVING and Post-Aggregation Filtering

Advanced HAVING and Post-Aggregation Filtering

HAVING is the WHERE clause for aggregate results. It filters groups after aggregation, while WHERE filters rows before aggregation. Understanding the distinction — and knowing when to use each — is fundamental to writing correct aggregation queries.

The key rule: WHERE filters rows, HAVING filters groups. Trying to filter on an aggregate expression in WHERE is a syntax error; it must go in HAVING.

The execution order (critical)

SQL executes clauses in this order:

  1. FROM + JOIN — build the row set
  2. WHERE — filter individual rows
  3. GROUP BY — form groups
  4. Aggregate functions — compute per-group values
  5. HAVING — filter groups based on aggregate results
  6. SELECT — project columns
  7. ORDER BY — sort
  8. LIMIT / OFFSET — slice

This means you cannot reference a SELECT alias in HAVING — the alias doesn't exist yet when HAVING runs. You must repeat the aggregate expression in HAVING.


Example 1: Multi-condition HAVING with aggregate comparisons

Find industries that have both high total revenue AND a low collection rate — meaning they're billing a lot but collecting poorly:

SELECT
    c.industry,
    COUNT(DISTINCT c.client_id) AS clients,
    COUNT(DISTINCT i.invoice_id) AS invoices,
    ROUND(SUM(ii.quantity * ii.unit_price), 0) AS total_billed,
    ROUND(SUM(ii.quantity * ii.unit_price)
        FILTER (WHERE i.status = 'paid'), 0) AS total_collected,
    ROUND(100.0 * SUM(ii.quantity * ii.unit_price) FILTER (WHERE i.status = 'paid')
        / NULLIF(SUM(ii.quantity * ii.unit_price), 0), 1) AS collection_rate_pct
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
GROUP BY c.industry
HAVING
    SUM(ii.quantity * ii.unit_price) > 100000        -- enough volume to matter
    AND SUM(ii.quantity * ii.unit_price) FILTER (WHERE i.status = 'paid')
        / NULLIF(SUM(ii.quantity * ii.unit_price), 0) < 0.80   -- poor collection rate
ORDER BY total_billed DESC;

Note that the HAVING clause repeats the aggregate expressions rather than using the SELECT aliases total_billed and collection_rate_pct. This is required — aliases are not yet defined at the HAVING evaluation step.


Example 2: HAVING with subqueries — comparing groups to overall stats

HAVING can contain a subquery that provides a comparison value — useful for finding groups that are above or below the overall average:

SELECT
    p.category,
    COUNT(DISTINCT o.order_id) AS orders,
    ROUND((AVG(oi.quantity * oi.unit_price))::NUMERIC, 2) AS avg_line_value,
    ROUND(SUM(oi.quantity * oi.unit_price), 2) AS total_revenue
FROM products p
JOIN ec_order_items oi ON oi.product_id = p.product_id
JOIN ec_orders o ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.category
HAVING AVG(oi.quantity * oi.unit_price) > (
    -- Subquery: overall average line item value across all categories
    SELECT AVG(oi2.quantity * oi2.unit_price)
    FROM order_items oi2
    JOIN orders o2 ON o2.order_id = oi2.order_id
    WHERE o2.status = 'completed'
)
ORDER BY avg_line_value DESC;

This returns only categories where the average line value exceeds the overall average. The subquery in HAVING is evaluated once and its result used as the comparison threshold for every group.


Example 3: Chained aggregation — HAVING inside a CTE, then filtered again

When HAVING becomes complex, using CTEs to stage the aggregation allows cleaner filtering at each step:

-- Step 1: Find clients with significant order activity
WITH active_clients AS (
    SELECT
        c.customer_id,
        c.acquisition_channel,
        COUNT(DISTINCT o.order_id) AS order_count,
        SUM(o.total_amount) AS lifetime_value,
        MAX(DATE_TRUNC('day', o.created_at)::DATE) AS last_order_date
    FROM ec_customers c
    JOIN ec_orders o ON o.customer_id = c.customer_id
    WHERE o.status = 'completed'
    GROUP BY c.customer_id, c.acquisition_channel
    HAVING COUNT(DISTINCT o.order_id) >= 3           -- at least 3 orders
       AND SUM(o.total_amount) >= 500                -- at least £500 LTV
       AND MAX(DATE_TRUNC('day', o.created_at)::DATE) >= CURRENT_DATE - 365   -- active in last year
),
-- Step 2: Aggregate active clients by channel
channel_summary AS (
    SELECT
        acquisition_channel,
        COUNT(*) AS active_customers,
        ROUND((AVG(lifetime_value))::NUMERIC, 2) AS avg_ltv,
        ROUND((AVG(order_count))::NUMERIC, 1) AS avg_orders
    FROM active_clients
    GROUP BY acquisition_channel
    HAVING COUNT(*) >= 5  -- channel must have enough active customers to be meaningful
)
-- Step 3: Only show channels performing above median LTV
SELECT *
FROM channel_summary
WHERE avg_ltv > (SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_ltv) FROM channel_summary)
ORDER BY avg_ltv DESC;

The three-step CTE chain applies a HAVING at the first aggregation level, a second HAVING at the channel summary level, and a WHERE on the final CTE. Each step narrows the result set logically before the next step.


Example 4: HAVING with window functions via CTE

You cannot use a window function directly in a HAVING clause, but wrapping in a CTE or subquery solves this:

-- Find stocks whose average volume is in the top 25% of all stocks
WITH stock_volumes AS (
    SELECT
        c.ticker,
        c.sector,
        ROUND(AVG(sp.volume)) AS avg_daily_volume,
        ROUND(STDDEV_SAMP(sp.volume)) AS volume_stddev
    FROM stock_prices sp
    JOIN companies c ON c.company_id = sp.company_id
    GROUP BY c.ticker, c.sector
),
with_percentile AS (
    SELECT *,
        NTILE(4) OVER (ORDER BY avg_daily_volume) AS volume_quartile
    FROM stock_volumes
)
-- Now filter on the window function result
SELECT ticker, sector, avg_daily_volume, volume_stddev, volume_quartile
FROM with_percentile
WHERE volume_quartile = 4   -- top 25% by average volume
ORDER BY avg_daily_volume DESC;

The pattern: compute aggregates in a CTE, apply window functions in a second CTE, then use WHERE in the outer query to filter on the window result. This is the standard workaround for "I want to HAVING based on a window function" — which is not directly supported.

Key Takeaway

HAVING is WHERE for groups. Key rules: you cannot use SELECT aliases in HAVING (the alias doesn't exist yet); subqueries in HAVING provide dynamic thresholds; window functions require a CTE wrapper to be usable as filters. The multi-CTE pattern — aggregate → window → filter — is the clean PostgreSQL idiom for complex post-aggregation filtering.