Top-N Per Group — Patterns and Performance
Retrieving the top N rows per group is one of the most frequently needed analytical queries — top 3 products per category, top 5 customers per country, the latest 2 orders per customer. There are several ways to solve it in PostgreSQL. Each has different readability and performance characteristics.
Method 1: Window function + CTE (most common)
-- Top 2 products by revenue in each category
WITH product_revenue AS (
SELECT
p.product_id,
p.name,
p.category,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name, p.category
),
ranked AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
FROM product_revenue
)
SELECT name, category, revenue
FROM ranked
WHERE rn <= 2
ORDER BY category, revenue DESC;
Use ROW_NUMBER() when you want exactly N rows even when there are ties — the tiebreaker is arbitrary. Use RANK() when ties should all be included (you might get more than N rows).
Method 2: DISTINCT ON (PostgreSQL-specific)
PostgreSQL's DISTINCT ON is a concise alternative for top-1 per group:
-- Highest-revenue product per category (exactly one per category)
SELECT DISTINCT ON (p.category)
p.name,
p.category,
SUM(oi.quantity * oi.unit_price) AS revenue
FROM order_items oi
JOIN products p ON p.product_id = oi.product_id
GROUP BY p.product_id, p.name, p.category
ORDER BY p.category, SUM(oi.quantity * oi.unit_price) DESC;
DISTINCT ON (category) keeps one row per category — the first row after the ORDER BY within each category group. It's clean for top-1 but doesn't extend to top-N.