Table Design Patterns — Normalisation Trade-offs and Schema Evolution
Table Design Patterns — Normalisation Trade-offs and Schema Evolution
Relational table design involves constant trade-offs: normalisation reduces redundancy but requires more joins; denormalisation improves read performance but complicates writes. In analytics, the right design depends on your access patterns — a schema optimised for OLTP (many small writes) is often terrible for OLAP (large analytical reads).
Example 1: OLAP vs OLTP schema design
-- OLTP-style (highly normalised): minimises redundancy, good for writes
-- Requires many joins for analytics
CREATE TABLE IF NOT EXISTS customers_oltp (
customer_id SERIAL PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
country TEXT -- was: country_id INT REFERENCES countries
-- tier_id removed (customer_tiers not in schema)
);
-- OLAP-style (denormalised): copies frequently needed columns into the fact table
-- Avoids joins, faster for analytics
CREATE TABLE IF NOT EXISTS orders_analytics (
order_id SERIAL PRIMARY KEY,
customer_id INT,
-- Denormalised customer fields (snapshot at order time)
customer_country TEXT NOT NULL,
customer_channel TEXT NOT NULL,
customer_tier TEXT NOT NULL,
-- Denormalised product fields
product_category TEXT,
-- Fact measurements
created_at DATE NOT NULL,
total_amount NUMERIC(12,2) NOT NULL
);
-- The trade-off: if a customer's country changes, historical orders reflect the
-- old country (which is usually CORRECT for analytics — "country at time of purchase")
Example 2: Generated columns for computed analytics fields
Generated columns are computed automatically by PostgreSQL on every INSERT/UPDATE. They eliminate the need to compute derived values in every query:
CREATE TABLE IF NOT EXISTS orders_with_generated (
order_id SERIAL PRIMARY KEY,
customer_id INT NOT NULL,
created_at DATE NOT NULL,
total_amount NUMERIC(12,2) NOT NULL,
discount_amount NUMERIC(12,2) NOT NULL DEFAULT 0,
-- Generated: net amount after discount
net_amount NUMERIC(12,2) GENERATED ALWAYS AS (total_amount - discount_amount) STORED,
-- Generated: year for partitioning/grouping
order_year INT GENERATED ALWAYS AS (EXTRACT(YEAR FROM created_at)::INT) STORED,
-- Generated: quarter label
order_quarter TEXT GENERATED ALWAYS AS (
'Q' || EXTRACT(QUARTER FROM created_at)::TEXT || '-' || EXTRACT(YEAR FROM created_at)::TEXT
) STORED
);
-- Index generated columns directly
CREATE INDEX IF NOT EXISTS idx_order_year ON orders_with_generated (order_year);
CREATE INDEX IF NOT EXISTS idx_order_quarter ON orders_with_generated (order_quarter);
-- Queries can use these without any computation:
SELECT order_quarter, SUM(net_amount) FROM orders_with_generated
GROUP BY order_quarter ORDER BY order_quarter;
-- No DATE_TRUNC or arithmetic needed — the computed value is stored
Example 3: Schema evolution — adding columns safely
-- Adding a nullable column: instant (no table rewrite needed)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS source_platform TEXT DEFAULT 'web';
-- Adding a column with default: in PostgreSQL 11+, also instant for static defaults
-- (PostgreSQL stores the default in the catalog, not in each row)
ALTER TABLE orders ADD COLUMN IF NOT EXISTS api_version INT DEFAULT 2;
-- Backfilling the new column in batches (don't update all rows at once)
UPDATE orders SET source_platform = 'web'
WHERE source_platform IS NULL;
-- Adding NOT NULL constraint after backfill (validate without locking in PG 12+)
-- Step 1: Add constraint as NOT VALID (no scan, no lock)
ALTER TABLE orders ADD CONSTRAINT chk_platform_not_null
CHECK (source_platform IS NOT NULL) NOT VALID;
-- Step 2: Validate in background (shares lock, existing queries not blocked)
ALTER TABLE orders VALIDATE CONSTRAINT chk_platform_not_null;
-- Renaming columns: instant in PostgreSQL 10+
ALTER TABLE orders RENAME COLUMN source_platform TO acquisition_platform;
-- Removing a column: marks as dropped but doesn't rewrite table immediately
ALTER TABLE orders DROP COLUMN acquisition_platform;
-- Physical space is reclaimed by next VACUUM FULL or pg_repack
Example 4: Temporal tables — tracking changes over time
A temporal table keeps a history of every version of every row. The pattern: a current table for live data, a history table for all past states, and a trigger to automatically move rows to history on update/delete:
CREATE TABLE IF NOT EXISTS prices (
price_id SERIAL PRIMARY KEY,
product_id INT NOT NULL,
price NUMERIC(10,2) NOT NULL,
effective_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ -- NULL means currently active
);
-- Current active prices: only rows where expires_at IS NULL
CREATE OR REPLACE VIEW current_prices AS
SELECT * FROM prices WHERE expires_at IS NULL;
-- Trigger to create history records on update
CREATE OR REPLACE FUNCTION prices_temporal_trigger()
RETURNS TRIGGER AS $$
BEGIN
-- Expire the old row
UPDATE prices SET expires_at = NOW()
WHERE price_id = OLD.price_id AND expires_at IS NULL;
-- The new row is inserted by the UPDATE (actually INSERT + DELETE pattern below)
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Point-in-time query: what was the price on a specific date?
SELECT p.price, p.effective_at, p.expires_at
FROM prices p
WHERE p.product_id = 42
AND p.effective_at <= '2024-06-01'
AND (p.expires_at IS NULL OR p.expires_at > '2024-06-01')
ORDER BY p.effective_at DESC
LIMIT 1;
Key Takeaway
Design schemas for your dominant access pattern: OLTP normalises to reduce write conflicts; OLAP denormalises to reduce join overhead. Generated columns eliminate repetitive computed expressions. Schema evolution is safest in small steps with NOT VALID constraint validation. Temporal tables (with effective_at / expires_at) are the relational pattern for answering "what was true at time T?" — essential for pricing, compliance, and audit.