Aggregate Functions
Aggregate functions collapse a set of rows into a single value. You use them inside SELECT with GROUP BY, in HAVING clauses, and — as you'll see in the next section — inside window functions. This lesson covers every aggregate you'll actually reach for, including powerful ones that most tutorials skip entirely.
COUNT — three variations
The most common aggregate, with three meaningfully different forms:
SELECT
COUNT(*) AS total_rows, -- all rows, including NULLs
COUNT(customer_id) AS rows_with_customer, -- non-NULL values only
COUNT(DISTINCT customer_id) AS unique_customers -- distinct non-NULL values
FROM orders;
total_rows | rows_with_customer | unique_customers
------------+--------------------+------------------
200 | 200 | 50
Rule: COUNT(*) never ignores NULLs. COUNT(col) and COUNT(DISTINCT col) both ignore NULLs.
The distinction matters most when your column is nullable:
-- orders with at least one item vs. total orders
SELECT
COUNT(DISTINCT o.order_id) AS total_orders,
COUNT(DISTINCT oi.order_id) AS orders_with_items
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.order_id;
SUM and AVG
Revenue in ShopMetrics is always computed from order_items. To get per-order totals, aggregate at the item level:
-- Overall revenue stats for completed orders
SELECT
COUNT(DISTINCT o.order_id) AS completed_orders,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
AVG(oi.quantity * oi.unit_price) AS avg_line_item_value
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed';
For average order value (not average line item), compute order totals first, then average them:
-- Average, min, and max order value
SELECT
AVG(order_total) AS avg_order_value,
MIN(order_total) AS smallest_order,
MAX(order_total) AS largest_order
FROM (
SELECT
o.order_id,
SUM(oi.quantity * oi.unit_price) AS order_total
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY o.order_id
) AS order_totals;
avg_order_value | smallest_order | largest_order
-----------------+----------------+---------------
274.29 | 12.00 | 997.00
The NULL trap in AVG
AVG ignores NULLs — its denominator is the count of non-NULL values, not the total row count. This matters when your column is nullable:
-- If 40 of 200 products have NULL price (hypothetically):
AVG(price) -- averages the 160 non-NULL rows only
SUM(price) / COUNT(*) -- divides by all 200 rows (treats NULLs as 0 effectively)
Neither is wrong — they measure different things. Choose deliberately.
MIN and MAX
Works on any orderable type: numbers, dates, timestamps, text (alphabetical order).
-- Date range of all orders, and the price range of products
SELECT
MIN(o.created_at) AS first_order,
MAX(o.created_at) AS latest_order,
MAX(o.created_at) - MIN(o.created_at) AS date_span
FROM orders o
WHERE o.status = 'completed';
first_order | latest_order | date_span
----------------------------+----------------------------+----------------
2025-09-03 14:22:11+00 | 2025-12-29 18:47:03+00 | 117 days ...
-- Most and least expensive product in each category
SELECT
category,
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products
GROUP BY category
ORDER BY category;
STRING_AGG — concatenating into one string
Collapses multiple string values into a single delimited string. Essential for building comma-separated lists from grouped data without application-side loops.
-- All products in each completed order as a readable string
SELECT
o.order_id,
STRING_AGG(p.name, ', ' ORDER BY p.name) AS products
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
JOIN products p ON p.product_id = oi.product_id
WHERE o.status = 'completed'
GROUP BY o.order_id
ORDER BY o.order_id
LIMIT 5;
order_id | products
----------+------------------------------------------------
12 | Mechanical Keyboard, SQL Mastery Book
15 | Python Crash Course
18 | Dev Sticker Pack, SQL Mastery Book, USB-C Hub
22 | Wireless Mouse
25 | 4K Monitor, Bash Scripting Guide
The ORDER BY p.name inside STRING_AGG controls the ordering of concatenated values, independently of the outer ORDER BY.
ARRAY_AGG — collecting into an array
Similar to STRING_AGG but produces a PostgreSQL array. Use this when the consuming code (Python, JavaScript, etc.) expects an array rather than a string.
-- What order statuses has each customer experienced?
SELECT
c.name,
ARRAY_AGG(DISTINCT o.status ORDER BY o.status) AS statuses_seen
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.name
HAVING COUNT(*) > 3
ORDER BY c.name
LIMIT 5;
name | statuses_seen
-------------+--------------------------
Customer 12 | {completed,refunded}
Customer 23 | {completed,pending}
Customer 7 | {completed,pending,refunded}
FILTER — conditional aggregation in one pass
FILTER (WHERE condition) lets you apply multiple different conditions within a single query — no subqueries, no UNION. One table scan, multiple breakdowns.
-- Order count and item count broken down by status, in a single query
SELECT
COUNT(*) AS total_orders,
COUNT(*) FILTER (WHERE o.status = 'completed') AS completed,
COUNT(*) FILTER (WHERE o.status = 'refunded') AS refunded,
COUNT(*) FILTER (WHERE o.status = 'pending') AS pending,
SUM(oi.quantity) FILTER (WHERE o.status = 'completed') AS items_sold,
SUM(oi.quantity * oi.unit_price)
FILTER (WHERE o.status = 'completed') AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id;
total_orders | completed | refunded | pending | items_sold | revenue
--------------+-----------+----------+---------+------------+-----------
200 | 134 | 33 | 33 | 412 | 36764.50
This is cleaner and often faster than the older CASE WHEN pattern:
-- Older equivalent — still works, just harder to read
SUM(CASE WHEN o.status = 'completed' THEN oi.quantity * oi.unit_price ELSE 0 END)
Prefer FILTER in new code.
Aggregating expressions, not just columns
You can aggregate any expression:
-- Per-product sales summary
SELECT
p.name AS product,
p.category,
COUNT(DISTINCT oi.order_id) AS orders_containing_product,
SUM(oi.quantity) AS total_units_sold,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
ROUND(AVG(oi.unit_price), 2) AS avg_selling_price,
MAX(oi.quantity) AS max_qty_in_one_order
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name, p.category
ORDER BY total_revenue DESC;
NULL handling — the complete picture
Every aggregate ignores NULLs except COUNT(*). This is almost always correct behaviour, but it has one common trap:
-- If all values in a group are NULL, the aggregate returns NULL — not zero
MIN(o.created_at) -- → NULL if all rows in the group have NULL created_at
-- Wrap with COALESCE when the result is used in arithmetic
COALESCE(SUM(oi.quantity * oi.unit_price), 0) -- → 0 instead of NULL
This matters most when you subtract or divide aggregate results:
-- Per-product: revenue and share of total revenue
SELECT
p.name,
SUM(oi.quantity * oi.unit_price) AS product_revenue,
ROUND(
100.0 * SUM(oi.quantity * oi.unit_price)
/ NULLIF(SUM(SUM(oi.quantity * oi.unit_price)) OVER (), 0),
2
) AS pct_of_total
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
JOIN orders o ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY p.product_id, p.name
ORDER BY product_revenue DESC;
NULLIF(..., 0) prevents a division-by-zero error when total revenue is 0.
Aggregates with window functions — a preview
One of the most powerful SQL patterns: compare each individual row against its group aggregate, without collapsing the rows.
-- Each order's share of total revenue for completed orders
WITH order_totals AS (
SELECT
o.order_id,
c.name AS customer,
SUM(oi.quantity * oi.unit_price) AS order_total
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY o.order_id, c.name
)
SELECT
order_id,
customer,
order_total,
ROUND(
order_total * 100.0 / SUM(order_total) OVER (),
2
) AS pct_of_total
FROM order_totals
ORDER BY order_total DESC
LIMIT 10;
SUM(...) OVER () is a window function — you'll master these in the very next section. The key insight here is that regular aggregates (GROUP BY) and window functions often work together in the same query.
Practice
1. For each product category: show total units sold, total revenue, average selling price per unit, and the single most expensive unit price recorded in any sale. Sort by revenue descending.
2. Write a single query (no subqueries, no UNION) that shows: total orders, completed orders, refunded orders, pending orders, revenue from completed orders, and revenue from refunded orders — all from one scan.
3. For each customer with at least 3 completed orders: show their name, total spend, their largest single order total, and what percentage of their total spend that largest order represents. Round the percentage to one decimal place. (Hint: you'll need a subquery or CTE to first compute order totals.)
Solutions
1. Per-category sales summary
SELECT
p.category,
SUM(oi.quantity) AS total_units_sold,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
ROUND(AVG(oi.unit_price), 2) AS avg_selling_price,
MAX(oi.unit_price) AS most_expensive_unit_price
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.category
ORDER BY total_revenue DESC;
category | total_units_sold | total_revenue | avg_selling_price | most_expensive_unit_price
----------+-----------------+---------------+-------------------+---------------------------
Hardware | 35 | 14821.00 | 198.40 | 499.00
Courses | 22 | 9614.00 | 145.67 | 199.00
Books | 48 | 3891.04 | 44.99 | 59.99
Software | 18 | 1134.00 | 39.67 | 99.00
Merch | 31 | 683.69 | 14.99 | 19.99
AVG(oi.unit_price) averages the recorded selling price per line item — not the product's catalogue price. These can differ if prices changed over time or discounts were applied at checkout.
2. Full order breakdown in one scan
SELECT
COUNT(DISTINCT o.order_id) AS total_orders,
COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'completed') AS completed,
COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'refunded') AS refunded,
COUNT(DISTINCT o.order_id) FILTER (WHERE o.status = 'pending') AS pending,
COALESCE(SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'completed'), 0) AS completed_revenue,
COALESCE(SUM(oi.quantity * oi.unit_price) FILTER (WHERE o.status = 'refunded'), 0) AS refunded_revenue
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.order_id;
WITH order_totals AS (
SELECT
o.customer_id,
o.order_id,
SUM(oi.quantity * oi.unit_price) AS order_total
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY o.customer_id, o.order_id
)
SELECT
c.name,
SUM(ot.order_total) AS total_spend,
MAX(ot.order_total) AS largest_order,
ROUND(
MAX(ot.order_total) * 100.0 / SUM(ot.order_total),
1
) AS largest_order_pct
FROM customers c
JOIN order_totals ot ON ot.customer_id = c.customer_id
GROUP BY c.customer_id, c.name
HAVING COUNT(*) >= 3
ORDER BY total_spend DESC;
The CTE computes one row per completed order with its total. The outer query then groups by customer — COUNT(*) counts their completed orders, MAX(order_total) finds their biggest, and SUM(order_total) is their overall spend. Dividing MAX by SUM gives the concentration ratio: a high percentage means one big order dominates their history.