JOINs — Every Type, Every Gotcha

The mental model

Think of a JOIN as an instruction to the database: "Find me rows from table A and table B that are related according to this condition."

The type of JOIN you choose determines what happens when a row in A has no match in B — or vice versa. Get this mental model solid and JOIN bugs become obvious.

We're using the ShopMetrics database throughout. If you haven't set it up yet, go back to the previous lesson.

The schema has five tables: customers, orders, order_items, products, and events. Revenue is never a single column — it's always computed as SUM(quantity * unit_price) from order_items. Keep that in mind as you read the examples.

INNER JOIN

Returns only rows where the condition is met in both tables. JOIN and INNER JOIN are identical — use whichever reads more naturally.

-- All completed orders with their customer name
SELECT
    o.order_id,
    c.name       AS customer,
    c.country,
    o.created_at,
    o.status
FROM orders o
INNER JOIN customers c ON c.customer_id = o.customer_id
WHERE o.status = 'completed'
ORDER BY o.created_at DESC
LIMIT 10;

 order_id |   customer    | country |         created_at         |  status
----------+---------------+---------+----------------------------+-----------
      187 | Customer 23   | UK      | 2025-11-14 09:12:44+00     | completed
      203 | Customer 7    | US      | 2025-11-12 17:33:01+00     | completed
      ...

What happens to unmatched rows? If an order has a customer_id that doesn't exist in customers (an orphaned record), that order is silently excluded. This is usually correct behaviour — but knowing it prevents unexpected missing-row bugs.

LEFT JOIN

Returns all rows from the left table, plus matched rows from the right. Where there's no match on the right, you get NULLs.

-- All customers and how many orders they've placed (including customers with zero orders)
SELECT
    c.customer_id,
    c.name,
    c.country,
    COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.country
ORDER BY order_count DESC;

 customer_id |    name     | country | order_count
-------------+-------------+---------+-------------
          12 | Customer 12 | US      |           8
          23 | Customer 23 | UK      |           6
          ...
          31 | Customer 31 | DE      |           0   ← no orders, still appears
          44 | Customer 44 | AU      |           0

Customers 31 and 44 have no orders. With an INNER JOIN they'd disappear from the result entirely — silently and incorrectly.

The #1 LEFT JOIN mistake

Filtering on the right table inside WHERE silently converts your LEFT JOIN into an INNER JOIN:

-- ❌ WRONG — customers with no completed orders vanish entirely
SELECT c.name, o.order_id, o.status
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.status = 'completed';   -- NULLs fail this check → customer disappears

-- ✅ CORRECT — move the filter into the ON clause
SELECT c.name, o.order_id, o.status
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
                  AND o.status = 'completed';
-- customers with no completed orders still appear, with NULL order columns

When o.status is NULL (no match), the AND o.status = 'completed' condition simply doesn't exclude the left-side row — it just leaves the right-side columns as NULL.

Anti-join: finding rows with no match

LEFT JOIN + WHERE right.id IS NULL is the standard pattern for finding unmatched rows:

-- Customers who have never placed a single order
SELECT c.customer_id, c.name, c.email
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;

RIGHT JOIN

Mirrors LEFT JOIN — returns all rows from the right table, NULLs for unmatched left rows.

-- All orders, with customer info where available (same result as a LEFT JOIN with tables swapped)
SELECT
    o.order_id,
    o.created_at,
    c.name AS customer
FROM customers c
RIGHT JOIN orders o ON o.customer_id = c.customer_id
ORDER BY o.created_at DESC
LIMIT 5;

In practice, most developers prefer to flip the table order and use LEFT JOIN instead — the logic is identical, but LEFT JOIN reads more naturally. Stick to LEFT JOIN for consistency across your codebase.

FULL OUTER JOIN

Returns all rows from both tables, NULLs where there's no match on either side. Rarely needed in day-to-day queries, but invaluable when reconciling two datasets.

-- Identify data integrity issues: orders with no customer, customers with no orders
SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.status
FROM customers c
FULL OUTER JOIN orders o ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL   -- orders with no matching customer
   OR o.order_id IS NULL;     -- customers with no matching order

Common real-world use case: data integrity audits, comparing two extracts of the same dataset, merging tables from different source systems.

CROSS JOIN

Produces the Cartesian product — every row from table A paired with every row from table B. 50 customers × 20 products = 1,000 result rows.

-- Generate all customer × product combinations (recommendation engine scaffold)
SELECT
    c.customer_id,
    c.name       AS customer,
    p.product_id,
    p.name       AS product,
    p.category
FROM customers c
CROSS JOIN products p
ORDER BY c.customer_id, p.category, p.name;

CROSS JOIN has no ON clause — that's intentional. Accidentally writing a JOIN and forgetting the ON condition produces a cross join (PostgreSQL will error, but watch for it in dynamically built queries).

Practical uses:

Generating a complete date × category grid for reports (so missing days show 0 instead of being absent)
Building pivot table skeletons
Creating test data permutations

Self JOIN

Joining a table to itself. Common for hierarchical data, or for comparing rows within the same table.

-- Find pairs of customers from the same country
SELECT
    a.name    AS customer_a,
    b.name    AS customer_b,
    a.country
FROM customers a
JOIN customers b ON b.country = a.country
               AND b.customer_id > a.customer_id  -- avoid (A,B) and (B,A) duplicates
ORDER BY a.country, a.name;

You must alias both instances of the table (a and b). The b.customer_id > a.customer_id condition ensures each pair appears once rather than twice.

Joining more than two tables

Chain JOINs sequentially. Each JOIN adds another table to the working result set:

-- Full order breakdown: customer → order → line items → product
SELECT
    c.name                          AS customer,
    o.order_id,
    o.created_at,
    p.name                          AS product,
    p.category,
    oi.quantity,
    oi.unit_price,
    oi.quantity * oi.unit_price     AS line_total
FROM orders o
JOIN customers   c  ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
JOIN products    p  ON p.product_id  = oi.product_id
WHERE o.status = 'completed'
ORDER BY o.order_id, p.name;

Tip: Write the FROM table first, then JOIN in the direction data naturally flows. Here: order → who placed it → what was in it → what each item is. Reading the JOINs top to bottom tells the story.

To get an order-level total, add a GROUP BY and wrap the line detail in SUM:

SELECT
    c.name                              AS customer,
    o.order_id,
    o.created_at,
    COUNT(DISTINCT p.product_id)        AS distinct_products,
    SUM(oi.quantity * oi.unit_price)    AS order_total
FROM orders o
JOIN customers   c  ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
JOIN products    p  ON p.product_id  = oi.product_id
WHERE o.status = 'completed'
GROUP BY c.name, o.order_id, o.created_at
ORDER BY order_total DESC;

JOIN performance basics

JOINs on indexed columns are fast. Foreign key columns (customer_id, order_id, etc.) are indexed automatically.
JOINs on unindexed columns — like c.name = o.some_text_field — trigger a sequential scan. Fine for small tables, painful at scale.
You can inspect which index a JOIN uses with EXPLAIN ANALYZE. You'll master this in the Query Optimization section.

Practice

Run these against ShopMetrics before moving on:

1. List all products that have never appeared in any order. Show the product name and category.

2. For each completed order: show the order ID, customer name, number of distinct products, and the order total (sum of quantity × unit_price). Sort by order total descending.

3. Find pairs of customers from the same country who have each placed more than 3 orders. Show both customer names and their shared country. Each pair should appear only once (not as A,B and B,A).

Work through all three before moving on — the friction is where the learning happens.

Solutions

1. Products never in any order

The anti-join pattern: LEFT JOIN + WHERE right side IS NULL.

SELECT
    p.name,
    p.category
FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.product_id
WHERE oi.item_id IS NULL
ORDER BY p.category, p.name;

The LEFT JOIN keeps all products. For products with no order items, every order_items column is NULL — so WHERE oi.item_id IS NULL isolates exactly those products. This is consistently faster than NOT IN or NOT EXISTS for this shape of query.

2. Completed orders with totals

SELECT
    o.order_id,
    c.name                              AS customer,
    COUNT(DISTINCT oi.product_id)       AS distinct_products,
    SUM(oi.quantity * oi.unit_price)    AS order_total
FROM orders o
JOIN customers   c  ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id   = o.order_id
WHERE o.status = 'completed'
GROUP BY o.order_id, c.name
ORDER BY order_total DESC;

The key decision: GROUP BY o.order_id, c.name. You must include c.name in GROUP BY because it appears in SELECT but is not aggregated. Since name is functionally dependent on customer_id, and customer_id is already in the GROUP BY via the JOIN, this is logically sound.

3. Customer pairs from the same country with more than 3 orders each

WITH prolific AS (
    SELECT customer_id
    FROM orders
    GROUP BY customer_id
    HAVING COUNT(*) > 3
)
SELECT
    a.name    AS customer_a,
    b.name    AS customer_b,
    a.country
FROM customers a
JOIN customers   b  ON b.country     = a.country
                   AND b.customer_id > a.customer_id
JOIN prolific   pa  ON pa.customer_id = a.customer_id
JOIN prolific   pb  ON pb.customer_id = b.customer_id
ORDER BY a.country, a.name;

Two things make this work cleanly:

The CTE prolific pre-filters to customers with more than 3 orders. Without it, the self-join would produce all same-country pairs, and you'd need a messier HAVING clause.
b.customer_id > a.customer_id ensures each pair appears exactly once. If you used <> instead, you'd get both (Customer 2, Customer 7) and (Customer 7, Customer 2).

← Previous Next →