GROUP BY and HAVING

How GROUP BY works

GROUP BY collapses many rows into one row per group, then lets aggregate functions compute a value across each group.

SELECT
    status,
    COUNT(*) AS order_count
FROM orders
GROUP BY status
ORDER BY order_count DESC;
   status    | order_count
-------------+-------------
 completed   |         134
 refunded    |          33
 pending     |          33

GROUP BY status collapses all rows with the same status into a single output row. Every column in SELECT must either appear in GROUP BY or be wrapped in an aggregate function. PostgreSQL enforces this strictly — unlike MySQL, which silently picks an arbitrary value for ungrouped columns.

-- ❌ This errors in PostgreSQL
SELECT status, customer_id, COUNT(*)
FROM orders
GROUP BY status;
-- ERROR: column "orders.customer_id" must appear in the GROUP BY clause

Fix: aggregate it, or add it to GROUP BY

SELECT status, COUNT(DISTINCT customer_id) AS unique_customers, COUNT(*) AS orders
FROM orders
GROUP BY status;

Grouping by multiple columns

Each unique combination of the grouped column values becomes its own group.

-- Order count and revenue broken down by country and status
SELECT
    c.country,
    o.status,
    COUNT(DISTINCT o.order_id)           AS orders,
    SUM(oi.quantity * oi.unit_price)     AS revenue
FROM orders o
JOIN customers   c  ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
GROUP BY c.country, o.status
ORDER BY c.country, o.status;
 country |   status    | orders |  revenue
---------+-------------+--------+-----------
 AU      | completed   |     22 |  5882.25
 AU      | pending     |      3 |   420.50
 AU      | refunded    |      4 |   891.00
 CA      | completed   |     28 |  7104.75
 DE      | completed   |     19 |  4931.00
 ...

Grouping by expressions

You can GROUP BY any expression — including function calls, arithmetic, or type casts:

-- Revenue per calendar month (completed orders only)
SELECT
    DATE_TRUNC('month', o.created_at)    AS month,
    COUNT(DISTINCT o.order_id)            AS orders,
    SUM(oi.quantity * oi.unit_price)      AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', o.created_at)
ORDER BY month;
         month          | orders | revenue
------------------------+--------+---------
 2025-09-01 00:00:00+00 |     21 | 5412.00
 2025-10-01 00:00:00+00 |     34 | 8931.50
 2025-11-01 00:00:00+00 |     47 | 12843.50
 2025-12-01 00:00:00+00 |     32 | 8107.25

You'll also see GROUP BY 1 (grouping by the first SELECT expression by position). It's convenient but fragile — reorder your SELECT columns and the grouping changes silently.


HAVING — filtering groups

WHERE filters rows before grouping. HAVING filters groups after aggregation. This is the most important distinction in GROUP BY queries.

-- Countries with more than 5 completed orders AND over $3,000 revenue
SELECT
    c.country,
    COUNT(DISTINCT o.order_id)           AS orders,
    SUM(oi.quantity * oi.unit_price)     AS revenue
FROM orders o
JOIN customers   c  ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
WHERE o.status = 'completed'                          -- ← filters rows before grouping
GROUP BY c.country
HAVING COUNT(DISTINCT o.order_id) > 5                -- ← filters groups after aggregation
   AND SUM(oi.quantity * oi.unit_price) > 3000
ORDER BY revenue DESC;

The logical order of operations:

FROM + JOIN   → full row set
WHERE         → filter rows (before grouping)
GROUP BY      → collapse into groups
Aggregates    → compute per group
HAVING        → filter groups (after aggregation)
SELECT        → project columns
ORDER BY      → sort
LIMIT         → truncate

The WHERE vs HAVING trap

-- ❌ Wrong: HAVING is for aggregate conditions, not row-level filters
SELECT customer_id, COUNT(*) AS orders
FROM orders
GROUP BY customer_id
HAVING status = 'completed';   -- status is a row-level column, not an aggregate
-- PostgreSQL errors: column must appear in GROUP BY or aggregate function

-- ✅ Correct: row-level filters always go in WHERE
SELECT customer_id, COUNT(*) AS orders
FROM orders
WHERE status = 'completed'
GROUP BY customer_id;

Use WHERE for any filter that doesn't involve an aggregate. It's evaluated earlier, which means fewer rows reach the GROUP BY — a genuine performance win on large tables.


ROLLUP — automatic subtotals

ROLLUP is a GROUP BY extension that generates subtotal and grand-total rows in a single pass:

SELECT
    c.country,
    o.status,
    COUNT(DISTINCT o.order_id) AS orders
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
GROUP BY ROLLUP(c.country, o.status)
ORDER BY c.country NULLS LAST, o.status NULLS LAST;
 country |   status    | orders
---------+-------------+--------
 AU      | completed   |     22
 AU      | pending     |      3
 AU      | refunded    |      4
 AU      | NULL        |     29    ← subtotal for AU (all statuses)
 CA      | completed   |     28
 CA      | NULL        |     28
 ...
 NULL    | NULL        |    200    ← grand total

The NULL values in the grouping columns indicate "all values at this level." Use the GROUPING(col) function to distinguish a ROLLUP-generated NULL from a real NULL value in the data.


A note on GROUP BY and NULL

NULL is treated as its own group — all NULL values for a column are grouped together:

-- How many orders placed each hour of the day?
SELECT
    EXTRACT(HOUR FROM created_at) AS hour_of_day,
    COUNT(*)                       AS orders
FROM orders
GROUP BY EXTRACT(HOUR FROM created_at)
ORDER BY hour_of_day;

If created_at were nullable, all NULL timestamps would form their own group. Be aware of this when your grouping column is nullable.


Practice

1. Show the top 5 customers by total spend on completed orders. Include their name, country, number of completed orders, and total spend. Break ties alphabetically by name.

2. Find every calendar month where total revenue from completed orders exceeded $8,000. Show the month, number of distinct orders, and revenue.

3. Show each product category with: total units sold, total revenue, and average revenue per order that included a product from that category. Only include categories where total units sold is at least 20.


Solutions

1. Top 5 customers by total spend

SELECT
    c.name,
    c.country,
    COUNT(DISTINCT o.order_id)           AS completed_orders,
    SUM(oi.quantity * oi.unit_price)     AS total_spend
FROM customers c
JOIN orders      o  ON o.customer_id = c.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
WHERE o.status = 'completed'
GROUP BY c.customer_id, c.name, c.country
ORDER BY total_spend DESC, c.name ASC
LIMIT 5;

    name     | country | completed_orders | total_spend
-------------+---------+------------------+-------------
 Customer 12 | US      |                8 |     2841.50
 Customer 7  | US      |                6 |     2203.00
 Customer 23 | UK      |                6 |     1987.25
 Customer 3  | UK      |                5 |     1654.00
 Customer 18 | CA      |                5 |     1521.75

Two things worth noting: c.customer_id is in the GROUP BY even though it's not in SELECT — this is valid and ensures you're grouping on the primary key, not just the name (two customers could share a name). The tie-break c.name ASC in ORDER BY only activates when total_spend values are equal.


2. Months where revenue exceeded 1,000

SELECT
    DATE_TRUNC('month', o.created_at)    AS month,
    COUNT(DISTINCT o.order_id)            AS orders,
    SUM(oi.quantity * oi.unit_price)      AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY DATE_TRUNC('month', o.created_at)
HAVING SUM(oi.quantity * oi.unit_price) > 1000
ORDER BY month;

The WHERE o.status = 'completed' filters rows before grouping — only completed order line items reach the GROUP BY. The HAVING then filters the resulting monthly groups. You cannot put SUM(...) > 8000 in WHERE because aggregates are not yet computed at that stage.


3. Product categories with at least 20 units sold

SELECT
    p.category,
    SUM(oi.quantity)                                        AS total_units_sold,
    SUM(oi.quantity * oi.unit_price)                        AS total_revenue,
    ROUND(
        SUM(oi.quantity * oi.unit_price)::NUMERIC
        / COUNT(DISTINCT oi.order_id),
        2
    )                                                       AS avg_revenue_per_order
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
HAVING SUM(oi.quantity) >= 20
ORDER BY total_revenue DESC;

 category | total_units_sold | total_revenue | avg_revenue_per_order
----------+-----------------+---------------+-----------------------
 Books    |               48 |      3891.04  |                 97.28
 Hardware |               35 |     14821.00  |                494.03
 Courses  |               22 |      9614.00  |                320.47

The avg_revenue_per_order divides category revenue by the count of distinct orders that contained at least one product from that category — not by total orders overall. The ::NUMERIC cast ensures PostgreSQL performs decimal division rather than integer division.