Index Types and Strategies — B-tree, GIN, GiST, BRIN, and Hash

Index Types and Strategies — B-tree, GIN, GiST, BRIN, and Hash

PostgreSQL has five main index types, each optimised for a different access pattern. Choosing the wrong index type is one of the most common performance mistakes — and choosing the right one can reduce query time from minutes to milliseconds.

The five index types

Type Best for Supports
B-tree Equality, range, ordering on scalar values =, <, >, <=, >=, BETWEEN, ORDER BY
GIN Multi-valued types (arrays, JSONB, tsvector) @>, ?, &&, @@
GiST Geometric types, full-text, ranges, nearest-neighbour Overlap, containment, distance
BRIN Very large tables with physical ordering correlation Range queries on sequential data
Hash Equality only (rarely useful; B-tree covers it with more) = only

Example 1: B-tree index strategies

B-tree is the default and handles the vast majority of indexing needs:

-- Setup: create a small `documents_with_search` table for the
-- search-vector index examples below.
CREATE TABLE IF NOT EXISTS documents_with_search (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT,
    body        TEXT,
    search_vector TSVECTOR GENERATED ALWAYS AS
        (to_tsvector('english', COALESCE(name,'') || ' ' || COALESCE(body,''))) STORED
);
-- Basic single-column index
CREATE INDEX IF NOT EXISTS idx_orders_date ON orders (created_at);
-- Speeds up: WHERE created_at > '2025-01-01' ORDER BY created_at

-- Descending index (useful when queries ORDER BY col DESC)
CREATE INDEX IF NOT EXISTS idx_orders_date_desc ON orders (created_at DESC);

-- Composite index — column order matters enormously
CREATE INDEX IF NOT EXISTS idx_orders_status_date ON orders (status, created_at);
-- Good for: WHERE status = 'completed' AND created_at > '2025-01-01'
-- Also usable for: WHERE status = 'completed' (index scan on first column)
-- NOT useful for: WHERE created_at > '2025-01-01' without status filter

-- Covering index — INCLUDE additional columns to avoid table heap access
CREATE INDEX IF NOT EXISTS idx_ec_orders_covering ON ec_orders (customer_id, created_at)
    INCLUDE (total_amount, status);
-- For queries like:
-- SELECT total_amount, status FROM ec_orders WHERE customer_id = 5 ORDER BY created_at
-- This is an index-only scan — no heap access at all

-- Partial index — index only a subset of rows
CREATE INDEX IF NOT EXISTS idx_active_orders ON orders (customer_id, created_at)
    WHERE status = 'completed';
-- Smaller index, faster for completed-order queries
-- Planner will use it when WHERE status = 'completed' is in the query

-- Expression/functional index
-- ❌ Note: DATE_TRUNC on TIMESTAMPTZ is NOT IMMUTABLE in PostgreSQL — 
-- you'd need to cast at TIME ZONE 'UTC' first or use a generated column.
-- CREATE INDEX IF NOT EXISTS idx_order_month ON orders (DATE_TRUNC('month', created_at));
-- Required for: WHERE DATE_TRUNC('month', created_at) = '2025-01-01'
-- Without this, the function prevents index usage on created_at

Example 2: GIN indexes for arrays and JSONB

-- ❌ template (uses articles/events_log/product_tags created in your own setup)
-- GIN for native arrays
CREATE INDEX IF NOT EXISTS idx_tags_gin ON product_tags USING GIN (tags);
-- Supports: WHERE tags @> ARRAY['laptop','premium']
-- Supports: WHERE 'laptop' = ANY(tags)  [planner rewrites to @>]
-- Supports: WHERE tags && ARRAY['apple','lenovo']

-- GIN for JSONB
CREATE INDEX IF NOT EXISTS idx_payload_gin ON events_log USING GIN (payload);
-- Supports: WHERE payload @> '{"event_type":"purchase"}'
-- Supports: WHERE payload ? 'session_id'

-- GIN for full-text search
CREATE INDEX IF NOT EXISTS idx_articles_fts ON articles USING GIN (search_vector);
-- Supports: WHERE search_vector @@ to_tsquery('english', 'postgresql')

-- GIN jsonb_path_ops opclass — smaller, faster for @> only (no ? support)
CREATE INDEX IF NOT EXISTS idx_payload_path ON events_log USING GIN (payload jsonb_path_ops);
-- Only supports @> but with smaller index footprint than default GIN

Example 3: BRIN indexes for time-series tables

BRIN (Block Range INdex) stores min/max values per block range rather than individual values. The index is tiny (kilobytes for billion-row tables) but only useful when the table is physically sorted by the indexed column — which is natural for append-only time-series data:

-- BRIN for a large time-series events table
-- Works because rows are inserted in roughly chronological order
CREATE INDEX IF NOT EXISTS idx_events_brin ON events USING BRIN (occurred_at)
    WITH (pages_per_range = 128);
-- Each block range covers 128 pages
-- The index stores the min/max occurred_at for each block range

-- BRIN is useful for:
-- WHERE occurred_at > NOW() - INTERVAL '7 days'  -- eliminates old block ranges
-- WHERE occurred_at BETWEEN '2025-01-01' AND '2025-02-01'

-- NOT useful for:
-- WHERE customer_id = 5  -- customer_id has no correlation with physical storage order

-- Check correlation to determine BRIN suitability
SELECT attname, correlation
FROM pg_stats
WHERE tablename = 'events'
ORDER BY ABS(correlation) DESC;
-- correlation near 1.0 (or -1.0) = good BRIN candidate
-- correlation near 0.0 = BRIN will be useless

Example 4: Index maintenance and monitoring

-- Find unused indexes (candidates for removal)
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%_pkey'  -- keep primary keys
  AND indexname NOT LIKE '%_unique'  -- keep unique constraints
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find indexes that are rarely used but large
SELECT
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size,
    pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS table_size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

-- Check for duplicate indexes (same column set, different names)
SELECT
    a.tablename, a.indexname AS idx_a, b.indexname AS idx_b,
    a.indexdef
FROM pg_indexes a JOIN pg_indexes b
    ON a.tablename = b.tablename
    AND a.indexdef = b.indexdef
    AND a.indexname < b.indexname
ORDER BY a.tablename;

-- Index bloat — indexes grow over time with updates/deletes
-- REINDEX CONCURRENTLY rebuilds without locking
REINDEX INDEX CONCURRENTLY idx_orders_date;

-- Or VACUUM to reclaim dead tuples (run automatically by autovacuum)
VACUUM ANALYZE orders;

Key Takeaway

B-tree covers most needs (equality, range, ORDER BY). GIN covers multi-valued types (arrays, JSONB, tsvector). BRIN covers append-only time-series tables with minimal overhead. The composite index column order matters — most selective filter first. Covering indexes with INCLUDE eliminate heap access. Partial indexes reduce size and maintenance overhead. Regularly audit with pg_stat_user_indexes to drop unused indexes — every index costs write performance.