Recursive CTEs — Hierarchical Data

Recursive CTEs — Hierarchical Data

Hierarchical data — organization charts, product categories, comment threads, folder structures — is everywhere. Recursive CTEs are the standard SQL tool for traversing these tree structures.

What Makes Data Hierarchical?

A self-referential table: each row has a reference to another row in the same table, forming parent-child relationships.

-- Classic org chart structure -- run this once to set up the lesson data
CREATE TABLE IF NOT EXISTS employees (
    employee_id  INT PRIMARY KEY,
    name         TEXT,
    title        TEXT,
    manager_id   INT REFERENCES employees(employee_id)
);

INSERT INTO employees VALUES
  (1, 'Alice',  'CEO',               NULL),
  (2, 'Bob',    'VP Engineering',    1),
  (3, 'Carol',  'VP Sales',          1),
  (4, 'Dave',   'Engineering Mgr',   2),
  (5, 'Eve',    'Senior Engineer',   4),
  (6, 'Frank',  'Engineer',          4),
  (7, 'Grace',  'Sales Mgr',         3),
  (8, 'Henry',  'Account Executive', 7)
ON CONFLICT (employee_id) DO NOTHING;

The CEO has manager_id = NULL (root). Everyone else points to their manager.


Pattern: Walking Down a Tree (Top → Leaves)

WITH RECURSIVE org_tree AS (
    -- Anchor: start at the root (CEO)
    SELECT
        employee_id,
        name,
        title,
        manager_id,
        0               AS depth,
        name::TEXT      AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: find direct reports of current level
    SELECT
        e.employee_id,
        e.name,
        e.title,
        e.manager_id,
        ot.depth + 1,
        ot.path || ' → ' || e.name
    FROM employees e
    JOIN org_tree ot ON ot.employee_id = e.manager_id
)
SELECT
    REPEAT('    ', depth) || name AS indented_name,
    title,
    depth,
    path
FROM org_tree
ORDER BY path;

Key columns to add:

  • depth — how many levels from the root (useful for indentation, filtering)
  • path — full ancestry chain (useful for cycle detection and display)

Pattern: Walking Up a Tree (Leaf → Root)

Find all ancestors of a specific employee:

WITH RECURSIVE ancestors AS (
    -- Anchor: start at a specific employee
    SELECT employee_id, name, manager_id, 0 AS depth
    FROM employees
    WHERE employee_id = 5    -- the employee we're looking up (Eve)

    UNION ALL

    -- Recursive: go up to the manager
    SELECT e.employee_id, e.name, e.manager_id, a.depth + 1
    FROM employees e
    JOIN ancestors a ON a.manager_id = e.employee_id
)
SELECT * FROM ancestors ORDER BY depth;

Example 1: Employee Count per Manager (All Levels)

How many direct and indirect reports does each manager have?

WITH RECURSIVE org_tree AS (
    SELECT
        employee_id,
        manager_id,
        employee_id AS root_manager_id
    FROM employees

    UNION ALL

    SELECT
        e.employee_id,
        e.manager_id,
        ot.root_manager_id
    FROM employees e
    JOIN org_tree ot ON ot.employee_id = e.manager_id
)
SELECT
    m.name AS manager,
    COUNT(*) - 1 AS total_reports   -- subtract self
FROM org_tree ot
JOIN employees m ON m.employee_id = ot.root_manager_id
GROUP BY m.name
ORDER BY total_reports DESC;

Example 2: Using ShopMetrics Product Hierarchy

Suppose ec_products had a category hierarchy (parent_category → sub_category). Let's simulate with our data:

-- Using ec_events to find product view chains
-- (demonstration of path building with real data)
WITH RECURSIVE category_rollup AS (
    SELECT
        category        AS leaf_category,
        category        AS root_category,
        0               AS depth,
        ARRAY[category] AS category_path
    FROM ec_products
    GROUP BY category

    UNION ALL

    -- In a real hierarchy you'd join parent_id here
    -- For demonstration, we stop at depth 0
    SELECT cr.leaf_category, cr.root_category, cr.depth, cr.category_path
    FROM category_rollup cr
    WHERE false   -- no actual hierarchy in this schema
)
SELECT * FROM category_rollup;

For a real example, let's use a proper hierarchy from our Sports dataset — tournament brackets:

-- Simulate a knock-out bracket with inline sample data
WITH RECURSIVE
match_data (match_id, round_num, home_team, away_team, home_goals, away_goals) AS (
    VALUES
        (1, 1, 'Arsenal',   'Chelsea',    2, 1),
        (2, 1, 'Liverpool', 'Man City',   0, 2),
        (3, 1, 'Spurs',     'Everton',    3, 1),
        (4, 1, 'Newcastle', 'West Ham',   1, 2),
        (5, 2, 'Arsenal',   'Man City',   1, 2),
        (6, 2, 'Spurs',     'West Ham',   2, 0),
        (7, 3, 'Man City',  'Spurs',      3, 1)
),
tournament AS (
    -- Quarter-finals (round 1)
    SELECT
        match_id,
        home_team,
        away_team,
        home_goals,
        away_goals,
        round_num,
        CASE
            WHEN home_goals > away_goals THEN home_team
            ELSE away_team
        END AS winner
    FROM match_data
    WHERE round_num = 1

    UNION ALL

    -- Subsequent rounds: winner advances
    SELECT
        m.match_id,
        m.home_team,
        m.away_team,
        m.home_goals,
        m.away_goals,
        m.round_num,
        CASE
            WHEN m.home_goals > m.away_goals THEN m.home_team
            ELSE m.away_team
        END
    FROM match_data m
    JOIN tournament t
        ON (m.home_team = t.winner OR m.away_team = t.winner)
        AND m.round_num = t.round_num + 1
)
SELECT
    round_num  AS round,
    match_id,
    home_team,
    home_goals,
    away_goals,
    away_team,
    winner
FROM tournament
ORDER BY round_num, match_id;

Example 3: Bill of Materials (BOM) — Parts Explosion

A classic manufacturing use case — a product is made of components, which are themselves made of sub-components:

WITH RECURSIVE
components (component_id, parent_id, component_name, quantity_needed) AS (
    VALUES
        (1, NULL::INT, 'Bicycle',        1),
        (2, 1,         'Frame',          1),
        (3, 1,         'Wheel',          2),
        (4, 3,         'Tyre',           1),
        (5, 3,         'Spoke',         32),
        (6, 3,         'Hub',            1),
        (7, 1,         'Handlebar',      1)
),
bom AS (
    SELECT
        component_id,
        parent_id,
        component_name,
        quantity_needed,
        1.0 AS total_quantity,
        0   AS depth
    FROM components
    WHERE parent_id IS NULL

    UNION ALL

    SELECT
        c.component_id,
        c.parent_id,
        c.component_name,
        c.quantity_needed,
        bom.total_quantity * c.quantity_needed,
        bom.depth + 1
    FROM components c
    JOIN bom ON bom.component_id = c.parent_id
)
SELECT
    REPEAT('  ', depth) || component_name AS component,
    quantity_needed,
    total_quantity,
    depth
FROM bom
ORDER BY depth, component_name;

The key insight: total_quantity multiplies through each level so leaf nodes show their total required quantity, not just the quantity per immediate parent.

Purchase this course to unlock the full lesson.

Sign up