Query Rewriting — CTEs, Subqueries, and Lateral Joins Compared
Query Rewriting — CTEs, Subqueries, and Lateral Joins Compared
The same logical query can often be expressed in multiple syntactically different ways: correlated subquery, uncorrelated subquery, CTE, lateral join, or window function. Each has different performance characteristics and different readability trade-offs. Knowing which form the planner handles best — and when to override the planner's choice — is an advanced optimisation skill.
Example 1: Correlated vs uncorrelated subqueries
A correlated subquery re-executes for each outer row. An uncorrelated subquery executes once. The difference in performance can be enormous:
-- CORRELATED: executes once per client row (N executions total)
-- This is O(N×M) — extremely slow for large tables
SELECT c.company_name AS name,
(SELECT COUNT(*) FROM invoices WHERE client_id = c.client_id) AS invoice_count
FROM clients c;
-- UNCORRELATED via JOIN: executes once (O(N+M))
SELECT c.company_name AS name, COALESCE(cnt.invoice_count, 0) AS invoice_count
FROM clients c
LEFT JOIN (
SELECT client_id, COUNT(*) AS invoice_count
FROM invoices GROUP BY client_id
) cnt ON cnt.client_id = c.client_id;
-- WINDOW FUNCTION: most elegant for "aggregate + keep detail"
SELECT DISTINCT c.company_name AS name,
COUNT(i.invoice_id) OVER (PARTITION BY c.client_id) AS invoice_count
FROM clients c
LEFT JOIN invoices i ON i.client_id = c.client_id;
When correlated subqueries are acceptable:
EXISTS/NOT EXISTSchecks (often optimised to a semi-join by the planner)LATERALsubqueries that reference outer columns (legitimate correlated pattern)- Subqueries in
SELECTreturning a single scalar for a small outer result set
Example 2: CTE optimisation fence — when CTEs hurt performance
In PostgreSQL 12+, non-recursive CTEs are "inlined" by default — the planner can push predicates through them, making them equivalent to subqueries. In PostgreSQL 11 and earlier, every CTE was an "optimisation fence" that prevented predicate pushdown.
-- In PostgreSQL 12+: these are functionally identical in performance
WITH recent_orders AS (
SELECT * FROM orders WHERE created_at >= '2025-01-01'
)
SELECT * FROM recent_orders WHERE customer_id = 5;
-- Equivalent plan to:
SELECT * FROM orders WHERE created_at >= '2025-01-01' AND customer_id = 5;
-- But MATERIALIZED forces the old fence behaviour (useful sometimes):
WITH expensive_computation AS MATERIALIZED (
SELECT customer_id,
CORR(EXTRACT(EPOCH FROM created_at)::float, customer_id::float) AS corr
FROM orders
GROUP BY customer_id
)
SELECT c.name AS customer_name, ec.corr
FROM customers c JOIN expensive_computation ec ON ec.customer_id = c.customer_id
WHERE ec.corr > 0.5;
-- Without MATERIALIZED, PostgreSQL 12+ might re-compute ec for each outer row join
-- When MATERIALIZED is good: when the CTE is referenced multiple times in the query
-- Without it, it might be re-evaluated each time it's referenced
Example 3: LATERAL joins — correlated subqueries done right
LATERAL allows a subquery in the FROM clause to reference columns from earlier in the FROM clause. Unlike a correlated subquery in SELECT, a LATERAL join can return multiple rows per outer row and the planner can optimise it more effectively:
-- Top 3 most recent orders per customer (classic top-N per group problem)
-- Old way: correlated subquery (O(N) subquery executions)
SELECT c.customer_id, c.email,
(SELECT order_id FROM orders o WHERE o.customer_id = c.customer_id
ORDER BY created_at DESC LIMIT 1) AS latest_order_id
FROM customers c;
-- LATERAL way: same semantics, one reference per outer row, multi-row capable
SELECT c.customer_id, c.email, recent.*
FROM customers c
CROSS JOIN LATERAL (
SELECT order_id, created_at, status
FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY created_at DESC
LIMIT 3
) recent;
-- Returns up to 3 rows per customer
-- LEFT JOIN LATERAL (include customers with no orders as NULL row)
SELECT c.customer_id, c.email, recent.order_id, recent.status
FROM customers c
LEFT JOIN LATERAL (
SELECT order_id, created_at, status
FROM orders o
WHERE o.customer_id = c.customer_id
ORDER BY created_at DESC
LIMIT 1
) recent ON TRUE;
The ON TRUE clause is required for LEFT JOIN LATERAL — it means "always join regardless of condition" (the WHERE inside the LATERAL already handles filtering).
Example 4: EXISTS vs IN vs JOIN — choosing the right set membership test
-- Finding clients that have at least one overdue invoice
-- IN: simple but can be slow if subquery returns many rows (duplicates issue)
SELECT company_name FROM clients
WHERE client_id IN (SELECT client_id FROM invoices WHERE status = 'overdue');
-- EXISTS: terminates as soon as the first match is found — fastest for existence tests
SELECT company_name FROM clients c
WHERE EXISTS (
SELECT 1 FROM invoices i WHERE i.client_id = c.client_id AND i.status = 'overdue'
);
-- SEMI-JOIN via JOIN + DISTINCT: explicit but verbose
SELECT DISTINCT c.company_name AS name FROM clients c
JOIN invoices i ON i.client_id = c.client_id AND i.status = 'overdue';
-- NOT EXISTS vs LEFT JOIN IS NULL (finding clients with NO overdue invoices)
-- NOT EXISTS: clearest intent
SELECT company_name FROM clients c
WHERE NOT EXISTS (
SELECT 1 FROM invoices i WHERE i.client_id = c.client_id AND i.status = 'overdue'
);
-- LEFT JOIN IS NULL: equivalent, sometimes chosen by planner anyway
SELECT DISTINCT c.company_name AS name FROM clients c
LEFT JOIN invoices i ON i.client_id = c.client_id AND i.status = 'overdue'
WHERE i.invoice_id IS NULL;
Performance rule of thumb:
- The PostgreSQL planner converts
IN (subquery)andEXISTSto semi-joins internally — they're usually identical in performance NOT INwith NULLs is dangerous:NOT IN (1, 2, NULL)returns no rows becausex NOT IN (NULL)is unknownNOT EXISTSis always safe with NULLs and is the preferred form
Key Takeaway
Avoid correlated subqueries in SELECT for large outer sets — use LEFT JOIN aggregation instead. In PostgreSQL 12+, CTEs are inlined by default; use MATERIALIZED when you need caching. LATERAL joins are the correct tool for top-N-per-group and correlated-but-set-returning patterns. EXISTS/NOT EXISTS are the safest set membership tests and typically the fastest due to early termination.