Table Design Patterns — Normalisation Trade-offs and Schema Evolution

Relational table design involves constant trade-offs: normalisation reduces redundancy but requires more joins; denormalisation improves read performance but complicates writes. In analytics, the right design depends on your access patterns — a schema optimised for OLTP (many small writes) is often terrible for OLAP (large analytical reads).

Example 1: OLAP vs OLTP schema design

-- OLTP-style (highly normalised): minimises redundancy, good for writes
-- Requires many joins for analytics
CREATE TABLE IF NOT EXISTS customers_oltp (
    customer_id  SERIAL PRIMARY KEY,
    email        TEXT UNIQUE NOT NULL,
    country      TEXT  -- was: country_id INT REFERENCES countries
    -- tier_id removed (customer_tiers not in schema)
);

-- OLAP-style (denormalised): copies frequently needed columns into the fact table
-- Avoids joins, faster for analytics
CREATE TABLE IF NOT EXISTS orders_analytics (
    order_id          SERIAL PRIMARY KEY,
    customer_id       INT,
    -- Denormalised customer fields (snapshot at order time)
    customer_country  TEXT NOT NULL,
    customer_channel  TEXT NOT NULL,
    customer_tier     TEXT NOT NULL,
    -- Denormalised product fields
    product_category  TEXT,
    -- Fact measurements
    created_at        DATE NOT NULL,
    total_amount      NUMERIC(12,2) NOT NULL
);

-- The trade-off: if a customer's country changes, historical orders reflect the
-- old country (which is usually CORRECT for analytics — "country at time of purchase")

Example 2: Generated columns for computed analytics fields

Generated columns are computed automatically by PostgreSQL on every INSERT/UPDATE. They eliminate the need to compute derived values in every query:

CREATE TABLE IF NOT EXISTS orders_with_generated (
    order_id        SERIAL PRIMARY KEY,
    customer_id     INT NOT NULL,
    created_at      DATE NOT NULL,
    total_amount    NUMERIC(12,2) NOT NULL,
    discount_amount NUMERIC(12,2) NOT NULL DEFAULT 0,
    -- Generated: net amount after discount
    net_amount NUMERIC(12,2) GENERATED ALWAYS AS (total_amount - discount_amount) STORED,
    -- Generated: year for partitioning/grouping
    order_year  INT GENERATED ALWAYS AS (EXTRACT(YEAR FROM created_at)::INT) STORED,
    -- Generated: quarter label
    order_quarter TEXT GENERATED ALWAYS AS (
        'Q' || EXTRACT(QUARTER FROM created_at)::TEXT || '-' || EXTRACT(YEAR FROM created_at)::TEXT
    ) STORED
);

-- Index generated columns directly
CREATE INDEX IF NOT EXISTS idx_order_year ON orders_with_generated (order_year);
CREATE INDEX IF NOT EXISTS idx_order_quarter ON orders_with_generated (order_quarter);

-- Queries can use these without any computation:
SELECT order_quarter, SUM(net_amount) FROM orders_with_generated
GROUP BY order_quarter ORDER BY order_quarter;
-- No DATE_TRUNC or arithmetic needed — the computed value is stored

Example 3: Schema evolution — adding columns safely

-- Adding a nullable column: instant (no table rewrite needed)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS source_platform TEXT DEFAULT 'web';

-- Adding a column with default: in PostgreSQL 11+, also instant for static defaults
-- (PostgreSQL stores the default in the catalog, not in each row)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS api_version INT DEFAULT 2;

-- Backfilling the new column in batches (don't update all rows at once)
UPDATE orders SET source_platform = 'web'
WHERE source_platform IS NULL;

-- Adding NOT NULL constraint after backfill (validate without locking in PG 12+)
-- Step 1: Add constraint as NOT VALID (no scan, no lock)
ALTER TABLE orders ADD CONSTRAINT chk_platform_not_null
    CHECK (source_platform IS NOT NULL) NOT VALID;
-- Step 2: Validate in background (shares lock, existing queries not blocked)
ALTER TABLE orders VALIDATE CONSTRAINT chk_platform_not_null;

-- Renaming columns: instant in PostgreSQL 10+
ALTER TABLE orders RENAME COLUMN source_platform TO acquisition_platform;

-- Removing a column: marks as dropped but doesn't rewrite table immediately
ALTER TABLE orders DROP COLUMN acquisition_platform;
-- Physical space is reclaimed by next VACUUM FULL or pg_repack

Example 4: Temporal tables — tracking changes over time

A temporal table keeps a history of every version of every row. The pattern: a current table for live data, a history table for all past states, and a trigger to automatically move rows to history on update/delete:

CREATE TABLE IF NOT EXISTS prices (
    price_id     SERIAL PRIMARY KEY,
    product_id   INT NOT NULL,
    price        NUMERIC(10,2) NOT NULL,
    effective_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    expires_at   TIMESTAMPTZ  -- NULL means currently active
);

-- Current active prices: only rows where expires_at IS NULL
CREATE OR REPLACE VIEW current_prices AS
    SELECT * FROM prices WHERE expires_at IS NULL;

-- Trigger to create history records on update
CREATE OR REPLACE FUNCTION prices_temporal_trigger()
RETURNS TRIGGER AS $$
BEGIN
    -- Expire the old row
    UPDATE prices SET expires_at = NOW()
    WHERE price_id = OLD.price_id AND expires_at IS NULL;
    -- The new row is inserted by the UPDATE (actually INSERT + DELETE pattern below)
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Point-in-time query: what was the price on a specific date?
SELECT p.price, p.effective_at, p.expires_at
FROM prices p
WHERE p.product_id = 42
  AND p.effective_at <= '2024-06-01'
  AND (p.expires_at IS NULL OR p.expires_at > '2024-06-01')
ORDER BY p.effective_at DESC
LIMIT 1;

Key Takeaway

Design schemas for your dominant access pattern: OLTP normalises to reduce write conflicts; OLAP denormalises to reduce join overhead. Generated columns eliminate repetitive computed expressions. Schema evolution is safest in small steps with NOT VALID constraint validation. Temporal tables (with effective_at / expires_at) are the relational pattern for answering "what was true at time T?" — essential for pricing, compliance, and audit.

← Previous Next →