PERCENT_RANK() and CUME_DIST() — Statistical Distribution

PERCENT_RANK() and CUME_DIST() measure where a row stands relative to all other rows in the window — returning a value between 0.0 and 1.0. They're essential for distribution analysis, scoring systems, and identifying statistical outliers.

PERCENT_RANK() — relative rank as a fraction

PERCENT_RANK() returns the percentile rank of each row: what fraction of rows in the window have a lower value.

PERCENT_RANK = (rank - 1) / (total rows - 1)

The first row is always 0.0. The last row is always 1.0.

WITH product_revenue AS (
    SELECT
        p.product_id,
        p.name,
        p.category,
        COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue
    FROM products p
    LEFT JOIN order_items oi ON oi.product_id = p.product_id
    GROUP BY p.product_id, p.name, p.category
)
SELECT
    name,
    category,
    revenue,
    ROUND(PERCENT_RANK() OVER (ORDER BY revenue)::NUMERIC, 4)                   AS pct_rank_global,
    ROUND(PERCENT_RANK() OVER (PARTITION BY category ORDER BY revenue)::NUMERIC, 4) AS pct_rank_in_cat
FROM product_revenue
ORDER BY revenue DESC;

A pct_rank_global of 0.90 means: 90% of all products have lower revenue than this one. The product is in the 90th percentile.

CUME_DIST() — cumulative distribution

CUME_DIST() returns the fraction of rows with a value less than or equal to the current row's value:

CUME_DIST = (number of rows with value <= current row's value) / total rows

The key difference from PERCENT_RANK:

PERCENT_RANK for the top row is always 1.0
CUME_DIST for the top row is also 1.0
PERCENT_RANK for the bottom row is always 0.0
CUME_DIST for the bottom row is 1 / total_rows (never 0)

WITH product_revenue AS (
    SELECT
        p.name,
        COALESCE(SUM(oi.quantity * oi.unit_price), 0) AS revenue
    FROM products p
    LEFT JOIN order_items oi ON oi.product_id = p.product_id
    GROUP BY p.product_id, p.name
)
SELECT
    name,
    revenue,
    ROUND(PERCENT_RANK() OVER (ORDER BY revenue)::NUMERIC, 4) AS pct_rank,
    ROUND(CUME_DIST()   OVER (ORDER BY revenue)::NUMERIC, 4) AS cume_dist
FROM product_revenue
ORDER BY revenue DESC;

CUME_DIST is often more intuitive for answering "what fraction of values are at or below this level?" — useful for threshold-based classification.

Purchase this course to unlock the full lesson.

← Previous Next →