Vacuuming, Bloat, and Autovacuum Tuning

Vacuuming, Bloat, and Autovacuum Tuning

PostgreSQL uses Multi-Version Concurrency Control (MVCC) to handle concurrent reads and writes. This means updates and deletes don't overwrite rows in place — they mark old versions as "dead" and write new versions elsewhere. Over time, dead rows accumulate, causing table bloat: the table file on disk becomes larger than the live data, and queries must scan more pages than necessary.

VACUUM is the process that reclaims dead row space and updates visibility information. Understanding it is essential for any production PostgreSQL installation.

Why MVCC creates bloat

Timeline:
1. INSERT row → live row version 1
2. UPDATE row → live row version 2, dead row version 1
3. DELETE row → dead row version 2
4. VACUUM runs → marks versions 1 and 2 as reusable space

Without regular vacuuming:

  • Dead rows accumulate → sequential scans read more pages → slower queries
  • Transaction ID wraparound can occur → critical: forces aggressive vacuum or database shutdown
  • Index bloat occurs separately (indexes also retain dead entries)

Example 1: Detecting table and index bloat

-- Check table bloat using pg_stat_user_tables
SELECT
    relname AS table_name,
    n_live_tup AS live_rows,
    n_dead_tup AS dead_rows,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 0
ORDER BY dead_pct DESC;

-- Physical table size vs estimated live data size
SELECT
    relname AS table,
    pg_size_pretty(pg_relation_size(relid)) AS disk_size,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_with_indexes,
    n_live_tup,
    pg_size_pretty(pg_relation_size(relid) / NULLIF(n_live_tup, 0)) AS bytes_per_live_row
FROM pg_stat_user_tables
ORDER BY pg_relation_size(relid) DESC
LIMIT 10;

Example 2: Manual VACUUM and VACUUM FULL

-- VACUUM: marks dead space as reusable (does NOT shrink the file)
VACUUM orders;  -- non-blocking
VACUUM VERBOSE orders;  -- verbose output

-- VACUUM ANALYZE: vacuum + update statistics in one pass
VACUUM ANALYZE orders;

-- VACUUM FULL: rewrites the entire table — shrinks the file but takes an exclusive lock
-- Use only during maintenance windows; kills concurrent queries
VACUUM FULL orders;  -- LOCKS TABLE - use carefully

-- VACUUM FREEZE: marks all rows as frozen (prevents transaction ID wraparound)
-- Run when: SELECT age(relfrozenxid) FROM pg_class WHERE relname = 'orders' is large
SELECT relname, age(relfrozenxid) AS xid_age, pg_size_pretty(pg_relation_size(oid))
FROM pg_class WHERE relkind = 'r' AND age(relfrozenxid) > 100000000
ORDER BY age(relfrozenxid) DESC;
-- If xid_age approaches 2 billion, PostgreSQL will force a freeze

Example 3: Autovacuum tuning

Autovacuum runs automatically in the background. The default settings are conservative — they trigger late and run slowly to minimize I/O impact. For high-write tables, aggressive autovacuum tuning dramatically reduces bloat:

-- Check current autovacuum settings for the whole cluster
SHOW autovacuum_vacuum_scale_factor;  -- default 0.2 (20% dead rows before triggering)
SHOW autovacuum_vacuum_threshold;     -- default 50 (minimum dead rows before triggering)

-- High-write tables benefit from per-table overrides
-- Table-level storage parameters override global autovacuum settings:
ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.01,   -- vacuum when 1% rows are dead (vs 20% default)
    autovacuum_vacuum_threshold = 100,        -- or 100 dead rows minimum
    autovacuum_vacuum_cost_delay = 2          -- ms delay between cost units (lower = faster vacuum)
);

ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,
    autovacuum_vacuum_threshold = 500
);

-- Check autovacuum's current activity
SELECT pid, query, state, now() - pg_stat_activity.query_start AS duration
FROM pg_stat_activity
WHERE query LIKE '%autovacuum%'
ORDER BY duration DESC;

Example 4: Index bloat and REINDEX

-- Index bloat check using pgstattuple (requires extension)
CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT
    indexrelid::regclass AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    (pgstattuple(indexrelid::regclass)).dead_tuple_percent AS bloat_pct
FROM pg_index
WHERE indrelid = 'orders'::regclass
ORDER BY bloat_pct DESC;

-- REINDEX CONCURRENTLY — rebuild index without locking (PostgreSQL 12+)
-- Old index remains active during rebuild; switched atomically when complete
REINDEX INDEX CONCURRENTLY idx_orders_date;
REINDEX TABLE CONCURRENTLY orders;  -- rebuilds all indexes on the table

-- Monitor bloat trend over time (store results in a monitoring table)
CREATE TABLE IF NOT EXISTS bloat_history (
    captured_at TIMESTAMPTZ DEFAULT NOW(),
    table_name TEXT,
    live_rows BIGINT,
    dead_rows BIGINT,
    dead_pct NUMERIC
);

INSERT INTO bloat_history (table_name, live_rows, dead_rows, dead_pct)
SELECT relname, n_live_tup, n_dead_tup,
    ROUND(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1)
FROM pg_stat_user_tables
WHERE n_live_tup > 1000;

Key Takeaway

MVCC means every update creates a dead row. VACUUM reclaims space; VACUUM FULL compacts but requires an exclusive lock. Autovacuum runs automatically but with conservative defaults — tune per-table with autovacuum_vacuum_scale_factor for high-write tables. REINDEX CONCURRENTLY rebuilds indexes without downtime. Monitor n_dead_tup in pg_stat_user_tables regularly; a consistently high dead percentage means autovacuum isn't keeping up.