Aggregate Functions

Aggregate functions collapse a set of rows into a single value. You use them inside SELECT with GROUP BY, in HAVING clauses, and — as you'll see in the next section — inside window functions. This lesson covers every aggregate you'll actually reach for, including powerful ones that most tutorials skip entirely.


COUNT — three variations

The most common aggregate, with three meaningfully different forms:

SELECT
    COUNT(*)                    AS total_rows,          -- all rows, including NULLs
    COUNT(customer_id)          AS rows_with_customer,  -- non-NULL values only
    COUNT(DISTINCT customer_id) AS unique_customers      -- distinct non-NULL values
FROM orders;
 total_rows | rows_with_customer | unique_customers
------------+--------------------+------------------
        200 |                200 |               50

Rule: COUNT(*) never ignores NULLs. COUNT(col) and COUNT(DISTINCT col) both ignore NULLs.

The distinction matters most when your column is nullable:

-- orders with at least one item vs. total orders
SELECT
    COUNT(DISTINCT o.order_id)   AS total_orders,
    COUNT(DISTINCT oi.order_id)  AS orders_with_items
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.order_id;

SUM and AVG

Revenue in ShopMetrics is always computed from order_items. To get per-order totals, aggregate at the item level:

-- Overall revenue stats for completed orders
SELECT
    COUNT(DISTINCT o.order_id)           AS completed_orders,
    SUM(oi.quantity * oi.unit_price)     AS total_revenue,
    AVG(oi.quantity * oi.unit_price)     AS avg_line_item_value
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed';

For average order value (not average line item), compute order totals first, then average them:

-- Average, min, and max order value
SELECT
    AVG(order_total)   AS avg_order_value,
    MIN(order_total)   AS smallest_order,
    MAX(order_total)   AS largest_order
FROM (
    SELECT
        o.order_id,
        SUM(oi.quantity * oi.unit_price) AS order_total
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    WHERE o.status = 'completed'
    GROUP BY o.order_id
) AS order_totals;
 avg_order_value | smallest_order | largest_order
-----------------+----------------+---------------
          274.29 |          12.00 |        997.00

The NULL trap in AVG

AVG ignores NULLs — its denominator is the count of non-NULL values, not the total row count. This matters when your column is nullable:

-- If 40 of 200 products have NULL price (hypothetically):
AVG(price)               -- averages the 160 non-NULL rows only
SUM(price) / COUNT(*)    -- divides by all 200 rows (treats NULLs as 0 effectively)

Neither is wrong — they measure different things. Choose deliberately.


MIN and MAX

Works on any orderable type: numbers, dates, timestamps, text (alphabetical order).

-- Date range of all orders, and the price range of products
SELECT
    MIN(o.created_at)                           AS first_order,
    MAX(o.created_at)                           AS latest_order,
    MAX(o.created_at) - MIN(o.created_at)       AS date_span
FROM orders o
WHERE o.status = 'completed';
        first_order         |        latest_order        |   date_span
----------------------------+----------------------------+----------------
 2025-09-03 14:22:11+00     | 2025-12-29 18:47:03+00     | 117 days ...
-- Most and least expensive product in each category
SELECT
    category,
    MIN(price) AS cheapest,
    MAX(price) AS most_expensive
FROM products
GROUP BY category
ORDER BY category;

STRING_AGG — concatenating into one string

Collapses multiple string values into a single delimited string. Essential for building comma-separated lists from grouped data without application-side loops.

-- All products in each completed order as a readable string
SELECT
    o.order_id,
    STRING_AGG(p.name, ', ' ORDER BY p.name) AS products
FROM orders o
JOIN order_items oi ON oi.order_id  = o.order_id
JOIN products    p  ON p.product_id = oi.product_id
WHERE o.status = 'completed'
GROUP BY o.order_id
ORDER BY o.order_id
LIMIT 5;
 order_id |                    products
----------+------------------------------------------------
       12 | Mechanical Keyboard, SQL Mastery Book
       15 | Python Crash Course
       18 | Dev Sticker Pack, SQL Mastery Book, USB-C Hub
       22 | Wireless Mouse
       25 | 4K Monitor, Bash Scripting Guide

The ORDER BY p.name inside STRING_AGG controls the ordering of concatenated values, independently of the outer ORDER BY.


ARRAY_AGG — collecting into an array

Similar to STRING_AGG but produces a PostgreSQL array. Use this when the consuming code (Python, JavaScript, etc.) expects an array rather than a string.

-- What order statuses has each customer experienced?
SELECT
    c.name,
    ARRAY_AGG(DISTINCT o.status ORDER BY o.status) AS statuses_seen
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.name
HAVING COUNT(*) > 3
ORDER BY c.name
LIMIT 5;
    name     |      statuses_seen
-------------+--------------------------
 Customer 12 | {completed,refunded}
 Customer 23 | {completed,pending}
 Customer 7  | {completed,pending,refunded}

FILTER — conditional aggregation in one pass

FILTER (WHERE condition) lets you apply multiple different conditions within a single query — no subqueries, no UNION. One table scan, multiple breakdowns.

-- Order count and item count broken down by status, in a single query
SELECT
    COUNT(*)                                              AS total_orders,
    COUNT(*) FILTER (WHERE o.status = 'completed')        AS completed,
    COUNT(*) FILTER (WHERE o.status = 'refunded')         AS refunded,
    COUNT(*) FILTER (WHERE o.status = 'pending')          AS pending,
    SUM(oi.quantity) FILTER (WHERE o.status = 'completed') AS items_sold,
    SUM(oi.quantity * oi.unit_price)
        FILTER (WHERE o.status = 'completed')              AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id;
 total_orders | completed | refunded | pending | items_sold |   revenue
--------------+-----------+----------+---------+------------+-----------
          200 |       134 |       33 |      33 |        412 |  36764.50

This is cleaner and often faster than the older CASE WHEN pattern:

-- Older equivalent — still works, just harder to read
SUM(CASE WHEN o.status = 'completed' THEN oi.quantity * oi.unit_price ELSE 0 END)

Prefer FILTER in new code.


Aggregating expressions, not just columns

You can aggregate any expression:

-- Per-product sales summary
SELECT
    p.name                              AS product,
    p.category,
    COUNT(DISTINCT oi.order_id)         AS orders_containing_product,
    SUM(oi.quantity)                    AS total_units_sold,
    SUM(oi.quantity * oi.unit_price)    AS total_revenue,
    ROUND(AVG(oi.unit_price), 2)        AS avg_selling_price,
    MAX(oi.quantity)                    AS max_qty_in_one_order
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name, p.category
ORDER BY total_revenue DESC;

NULL handling — the complete picture

Every aggregate ignores NULLs except COUNT(*). This is almost always correct behaviour, but it has one common trap:

-- If all values in a group are NULL, the aggregate returns NULL — not zero
MIN(o.created_at)   -- → NULL if all rows in the group have NULL created_at

-- Wrap with COALESCE when the result is used in arithmetic
COALESCE(SUM(oi.quantity * oi.unit_price), 0)  -- → 0 instead of NULL

This matters most when you subtract or divide aggregate results:

-- Per-product: revenue and share of total revenue
SELECT
    p.name,
    SUM(oi.quantity * oi.unit_price)    AS product_revenue,
    ROUND(
        100.0 * SUM(oi.quantity * oi.unit_price)
              / NULLIF(SUM(SUM(oi.quantity * oi.unit_price)) OVER (), 0),
        2
    )                                   AS pct_of_total
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
JOIN orders   o ON o.order_id   = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.product_id, p.name
ORDER BY product_revenue DESC;

NULLIF(..., 0) prevents a division-by-zero error when total revenue is 0.


Aggregates with window functions — a preview

One of the most powerful SQL patterns: compare each individual row against its group aggregate, without collapsing the rows.

-- Each order's share of total revenue for completed orders
WITH order_totals AS (
    SELECT
        o.order_id,
        c.name                              AS customer,
        SUM(oi.quantity * oi.unit_price)    AS order_total
    FROM orders o
    JOIN customers   c  ON c.customer_id = o.customer_id
    JOIN order_items oi ON oi.order_id   = o.order_id
    WHERE o.status = 'completed'
    GROUP BY o.order_id, c.name
)
SELECT
    order_id,
    customer,
    order_total,
    ROUND(
        order_total * 100.0 / SUM(order_total) OVER (),
        2
    ) AS pct_of_total
FROM order_totals
ORDER BY order_total DESC
LIMIT 10;

SUM(...) OVER () is a window function — you'll master these in the very next section. The key insight here is that regular aggregates (GROUP BY) and window functions often work together in the same query.


Practice

1. For each product category: show total units sold, total revenue, average selling price per unit, and the single most expensive unit price recorded in any sale. Sort by revenue descending.

2. Write a single query (no subqueries, no UNION) that shows: total orders, completed orders, refunded orders, pending orders, revenue from completed orders, and revenue from refunded orders — all from one scan.

3. For each customer with at least 3 completed orders: show their name, total spend, their largest single order total, and what percentage of their total spend that largest order represents. Round the percentage to one decimal place. (Hint: you'll need a subquery or CTE to first compute order totals.)


Solutions

1. Per-category sales summary

SELECT
    p.category,
    SUM(oi.quantity)                     AS total_units_sold,
    SUM(oi.quantity * oi.unit_price)     AS total_revenue,
    ROUND(AVG(oi.unit_price), 2)         AS avg_selling_price,
    MAX(oi.unit_price)                   AS most_expensive_unit_price
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
ORDER BY total_revenue DESC;

 category | total_units_sold | total_revenue | avg_selling_price | most_expensive_unit_price
----------+-----------------+---------------+-------------------+---------------------------
 Hardware |               35 |     14821.00  |            198.40 |                    499.00
 Courses  |               22 |      9614.00  |            145.67 |                    199.00
 Books    |               48 |      3891.04  |             44.99 |                     59.99
 Software |               18 |      1134.00  |             39.67 |                     99.00
 Merch    |               31 |       683.69  |             14.99 |                     19.99

AVG(oi.unit_price) averages the recorded selling price per line item — not the product's catalogue price. These can differ if prices changed over time or discounts were applied at checkout.


2. Full order breakdown in one scan

SELECT
    COUNT(DISTINCT o.order_id)                                                         AS total_orders,
    COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'completed')                   AS completed,
    COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'refunded')                    AS refunded,
    COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'pending')                     AS pending,
    COALESCE(SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'completed'), 0) AS completed_revenue,
    COALESCE(SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'refunded'),  0) AS refunded_revenue
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.order_id;
WITH order_totals AS (
    SELECT
        o.customer_id,
        o.order_id,
        SUM(oi.quantity * oi.unit_price) AS order_total
    FROM orders o
    JOIN order_items oi ON oi.order_id = o.order_id
    WHERE o.status = 'completed'
    GROUP BY o.customer_id, o.order_id
)
SELECT
    c.name,
    SUM(ot.order_total)                                     AS total_spend,
    MAX(ot.order_total)                                     AS largest_order,
    ROUND(
        MAX(ot.order_total) * 100.0 / SUM(ot.order_total),
        1
    )                                                       AS largest_order_pct
FROM customers c
JOIN order_totals ot ON ot.customer_id = c.customer_id
GROUP BY c.customer_id, c.name
HAVING COUNT(*) >= 3
ORDER BY total_spend DESC;

The CTE computes one row per completed order with its total. The outer query then groups by customer — COUNT(*) counts their completed orders, MAX(order_total) finds their biggest, and SUM(order_total) is their overall spend. Dividing MAX by SUM gives the concentration ratio: a high percentage means one big order dominates their history.