GROUPING SETS — Multiple GROUP BY Levels in One Query

GROUPING SETS — Multiple GROUP BY Levels in One Query

GROUPING SETS is one of the most powerful and underused features in PostgreSQL. It lets you compute multiple different GROUP BY combinations in a single query pass — producing subtotals, totals, and granular breakdowns all in one result set.

Without GROUPING SETS, building a report that shows revenue by year and quarter, year only, quarter only, and a grand total would require four separate queries joined with UNION ALL. With GROUPING SETS, it is one query. The database makes a single pass through the data and returns every grouping level simultaneously — which is both faster and more readable.

The problem GROUPING SETS solves

Imagine you need a sales report with three levels:

  1. Revenue per (region, product_category) — the detailed rows
  2. Revenue per region only — regional subtotals
  3. A grand total across everything

The naive approach:

-- Three separate queries — slow and repetitive
WITH sales(region, category, revenue) AS (
    VALUES
        ('North', 'Electronics', 45200),
        ('North', 'Clothing',    28100),
        ('North', 'Electronics', 12000),
        ('South', 'Electronics', 31500),
        ('South', 'Clothing',    19800),
        ('South', 'Electronics',  8700),
        ('West',  'Electronics', 27300),
        ('West',  'Clothing',    15600),
        ('East',  'Electronics', 22100),
        ('East',  'Clothing',    11900)
)
SELECT region, category, SUM(revenue) FROM sales GROUP BY region, category
UNION ALL
SELECT region, NULL, SUM(revenue) FROM sales GROUP BY region
UNION ALL
SELECT NULL, NULL, SUM(revenue) FROM sales;

This is verbose, scans the sales table three times, and becomes unmaintainable if you add more dimensions. GROUPING SETS collapses this into one pass.


Example 1: Basic GROUPING SETS syntax

This query calculates revenue subtotals at three levels simultaneously: by region+category, by region alone, and a grand total. The NULL values in the result indicate that the column was "collapsed" for that grouping level — a region-level row has NULL for category because category was not part of that grouping.

WITH sales(region, category, revenue) AS (
    VALUES
        ('North', 'Electronics', 45200),
        ('North', 'Clothing',    28100),
        ('North', 'Electronics', 12000),
        ('South', 'Electronics', 31500),
        ('South', 'Clothing',    19800),
        ('South', 'Electronics',  8700),
        ('West',  'Electronics', 27300),
        ('West',  'Clothing',    15600),
        ('East',  'Electronics', 22100),
        ('East',  'Clothing',    11900)
)
SELECT
    region,
    category,
    ROUND(SUM(revenue), 2) AS total_revenue,
    COUNT(*) AS transactions
FROM sales
GROUP BY GROUPING SETS (
    (region, category),   -- level 1: full detail
    (region),             -- level 2: regional subtotals
    ()                    -- level 3: grand total (empty set)
)
ORDER BY region NULLS LAST, category NULLS LAST;

What This Returns

Rows appear in three tiers:

  • Detail rows: both region and category are populated — e.g. North | Electronics | 45,200
  • Subtotal rows: category is NULL — e.g. North | NULL | 112,800 (total for North across all categories)
  • Grand total row: both region and category are NULL — e.g. NULL | NULL | 398,400

The NULLS LAST in ORDER BY pushes subtotal and grand total rows to the bottom within each group, which is the standard report format.


Example 2: The GROUPING() function — distinguishing NULLs

A critical problem: what if the region column itself can be NULL in the source data (unknown region)? You cannot tell whether a NULL in the output means "subtotal row" or "data was actually NULL". The GROUPING() function solves this — it returns 1 when a column was excluded from the current grouping, and 0 when it was included.

WITH sales(region, category, revenue) AS (
    VALUES
        ('North', 'Electronics', 45200),
        ('North', 'Clothing',    28100),
        ('North', 'Electronics', 12000),
        ('South', 'Electronics', 31500),
        ('South', 'Clothing',    19800),
        ('South', 'Electronics',  8700),
        ('West',  'Electronics', 27300),
        ('West',  'Clothing',    15600),
        ('East',  'Electronics', 22100),
        ('East',  'Clothing',    11900)
)
SELECT
    CASE WHEN GROUPING(region) = 1 THEN '(All Regions)' ELSE region END AS region,
    CASE WHEN GROUPING(category) = 1 THEN '(All Categories)' ELSE category END AS category,
    ROUND(SUM(revenue), 2) AS total_revenue,
    GROUPING(region) AS is_region_subtotal,
    GROUPING(category) AS is_category_subtotal,
    GROUPING(region) + GROUPING(category) AS grouping_level
FROM sales
GROUP BY GROUPING SETS (
    (region, category),
    (region),
    ()
)
ORDER BY grouping_level, region, category;

What This Returns

region category total_revenue is_region_subtotal is_category_subtotal grouping_level
North Electronics 45,200 0 0 0
North Clothing 28,100 0 0 0
North (All Categories) 112,800 0 1 1
(All Regions) (All Categories) 398,400 1 1 2

grouping_level = 0 means a fully detailed row. grouping_level = 2 means the grand total. This column is useful for ordering or filtering to specific levels of the report.


Example 3: GROUPING SETS with a real analytics table

Using the invoicing dataset — produce a single query that shows revenue broken down by (industry, country), by industry, by country, and as a grand total:

WITH invoice_revenue AS (
    SELECT
        c.industry,
        c.country,
        DATE_PART('year', i.issue_date) AS year,
        SUM(ii.quantity * ii.unit_price) AS revenue
    FROM clients c
    JOIN invoices i ON i.client_id = c.client_id
    JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
    GROUP BY c.industry, c.country, 3
)
SELECT
    COALESCE(CASE WHEN GROUPING(industry) = 1 THEN '(ALL)' ELSE industry END, industry) AS industry,
    COALESCE(CASE WHEN GROUPING(country)  = 1 THEN '(ALL)' ELSE country  END, country)  AS country,
    COALESCE(CASE WHEN GROUPING(year)     = 1 THEN 0 ELSE year END, 0)                  AS year,
    ROUND(SUM(revenue), 0) AS total_revenue,
    GROUPING(industry, country, year) AS grouping_key
FROM invoice_revenue
GROUP BY GROUPING SETS (
    (industry, country, year),  -- full detail
    (industry, year),           -- per industry per year
    (country, year),            -- per country per year
    (year),                     -- annual totals
    ()                          -- grand total
)
ORDER BY grouping_key, year, industry, country;

What This Returns

This single query produces five levels of aggregation simultaneously. GROUPING(industry, country, year) encodes all three grouping flags into a single integer using bit positions — it equals 0 for full-detail rows and 7 for the grand total. You can filter on it to extract only the level you need:

-- Example: Pull only the annual totals level by filtering on grouping key
-- (This WHERE clause must be inside a subquery or CTE, not at top level)
WITH invoice_revenue (industry, country, year, revenue) AS (
    VALUES
        ('Technology', 'US',  2023, 450000),
        ('Technology', 'UK',  2023, 280000),
        ('Finance',    'US',  2023, 310000),
        ('Finance',    'UK',  2024, 195000),
        ('Technology', 'US',  2024, 520000)
)
SELECT * FROM (
    SELECT
        industry, country, year::INT AS year,
        ROUND(SUM(revenue), 0) AS total_revenue,
        GROUPING(industry, country, year) AS grouping_key
    FROM invoice_revenue
    GROUP BY GROUPING SETS (
        (industry, country, year),
        (industry, year),
        (country, year),
        (year),
        ()
    )
) sub
WHERE grouping_key = 6  -- industry=1, country=1, year=0 in binary (annual totals)

Example 4: Combining GROUPING SETS with window functions

This advanced pattern adds running totals within each grouping level, allowing you to see both the absolute amount and the cumulative total at each detail row:

WITH gs_result AS (
    SELECT
        c.industry,
        DATE_TRUNC('quarter', i.issue_date)::DATE AS quarter,
        SUM(ii.quantity * ii.unit_price) AS revenue,
        GROUPING(c.industry) AS g_industry,
        GROUPING(DATE_TRUNC('quarter', i.issue_date)::DATE) AS g_quarter
    FROM clients c
    JOIN invoices i ON i.client_id = c.client_id
    JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
    GROUP BY GROUPING SETS (
        (c.industry, DATE_TRUNC('quarter', i.issue_date)::DATE),
        (c.industry),
        ()
    )
)
SELECT
    CASE WHEN g_industry = 1 THEN 'GRAND TOTAL' ELSE industry END AS industry,
    CASE WHEN g_quarter  = 1 THEN 'ALL QUARTERS' ELSE quarter::TEXT END AS quarter,
    ROUND(revenue, 2) AS revenue,
    ROUND(SUM(revenue) OVER (
        PARTITION BY industry, g_industry
        ORDER BY quarter NULLS LAST
    ), 2) AS running_total
FROM gs_result
ORDER BY g_industry, g_quarter, industry, quarter;

The window function's PARTITION BY industry, g_industry ensures that the running total restarts for each industry group, and doesn't bleed across grouping levels.

Key Takeaway

GROUPING SETS eliminates the UNION ALL anti-pattern for multi-level reports. Use the GROUPING() function to distinguish aggregated NULLs from data NULLs, and GROUPING(col1, col2, ...) as a bit-encoded level indicator for filtering. Every BI tool output (subtotals, grand totals, cross-tabs) is ultimately producing what GROUPING SETS computes natively.