Advanced JSONB Patterns — Schema Validation, Transformation, and Migration
Advanced JSONB Patterns — Schema Validation, Transformation, and Migration
JSONB's flexibility is its strength and its risk. Without discipline, JSONB columns become a dumping ground for inconsistently structured data that becomes increasingly difficult to query over time. This lesson covers patterns for maintaining JSONB quality: schema validation with CHECK constraints, transforming JSONB to relational form, and migrating JSONB columns to structured schemas.
Example 1: JSONB CHECK constraints for schema validation
PostgreSQL CHECK constraints run on INSERT and UPDATE — you can use them to enforce a minimum structure on JSONB columns:
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid; ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;
-- Enforce that 'attributes' always has required keys and correct value types
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;
ALTER TABLE product_catalog ADD CONSTRAINT chk_attributes_required
CHECK (
-- Must have a 'brand' key with a string value
jsonb_typeof(attributes -> 'brand') = 'string'
-- Electronics must have 'ram_gb' as a number
AND (category != 'Electronics' OR jsonb_typeof(attributes -> 'ram_gb') = 'number')
-- No null values allowed at top level (use key omission instead)
AND NOT (attributes @> '{"brand": null}'::JSONB)
) NOT VALID;
-- A more comprehensive constraint using a function
CREATE OR REPLACE FUNCTION validate_product_attributes(
cat TEXT, attrs JSONB
) RETURNS BOOLEAN AS $$
BEGIN
-- All products must have a brand
IF attrs ->> 'brand' IS NULL THEN RETURN FALSE; END IF;
-- Category-specific requirements
IF cat = 'Electronics' THEN
IF (attrs ->> 'ram_gb') IS NULL AND (attrs ->> 'storage_gb') IS NULL THEN
RETURN FALSE;
END IF;
END IF;
IF cat = 'Books' THEN
IF attrs ->> 'author' IS NULL THEN RETURN FALSE; END IF;
END IF;
RETURN TRUE;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid;
ALTER TABLE product_catalog ADD CONSTRAINT chk_attrs_valid
CHECK (validate_product_attributes(category, attributes)) NOT VALID;
Example 2: Transforming JSONB to relational columns
A common migration pattern: a JSONB column was used during prototyping, and now the data needs to be promoted to proper columns. The key is that JSONB-to-relational migration can be done entirely in SQL:
-- Drop the validation constraint added in the previous block — the
-- migration we are about to perform restructures the data and would
-- otherwise trip the existing CHECK while in flight.
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid;
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;
-- Step 1: Audit what data exists in the JSONB column
SELECT
key,
COUNT(*) AS rows_with_key,
(SELECT jsonb_typeof(attributes -> key)
FROM product_catalog
WHERE attributes ? key
LIMIT 1) AS sample_type
FROM product_catalog,
LATERAL jsonb_object_keys(attributes) AS key
GROUP BY key
ORDER BY rows_with_key DESC;
-- Step 2: Add the new columns
ALTER TABLE product_catalog
ADD COLUMN brand TEXT,
ADD COLUMN weight_kg NUMERIC(5,2),
ADD COLUMN color TEXT;
-- Step 3: Populate from JSONB
UPDATE product_catalog SET
brand = attributes ->> 'brand',
weight_kg = (attributes ->> 'weight_kg')::NUMERIC,
color = attributes ->> 'color';
-- Step 4: Verify migration (should be 0 discrepancies)
SELECT COUNT(*) AS discrepancies
FROM product_catalog
WHERE brand IS DISTINCT FROM (attributes ->> 'brand')
OR weight_kg IS DISTINCT FROM (attributes ->> 'weight_kg')::NUMERIC;
-- Step 5: Remove migrated keys from JSONB (clean up)
UPDATE product_catalog
SET attributes = attributes - ARRAY['brand', 'weight_kg', 'color'];
-- Step 6: Add NOT NULL once every row has a brand value (omitted here:
-- the demo dataset has rows without a brand attribute, so we skip the
-- ALTER COLUMN brand SET NOT NULL step. Run it after backfilling missing rows.)
Example 3: JSONB normalisation — splitting a heterogeneous JSONB column
When a JSONB column stores fundamentally different structures depending on a type discriminator, consider splitting it into a proper polymorphic schema:
-- ❌
-- Original: one events table with heterogeneous payload
-- After analysis, split into type-specific tables
-- Extract purchase events to their own table
CREATE TABLE purchase_events AS
SELECT
event_id,
customer_id,
session_id,
occurred_at,
(payload ->> 'order_id')::INT AS order_id,
(payload ->> 'amount')::NUMERIC AS amount,
payload ->> 'method' AS method,
payload ->> 'promo_code' AS promo_code
FROM events_log
WHERE event_type = 'purchase';
-- Extract search events
CREATE TABLE search_events AS
SELECT
event_id,
customer_id,
session_id,
occurred_at,
payload ->> 'query' AS search_query,
(payload ->> 'results_count')::INT AS results_count,
payload -> 'filters' AS filters_jsonb -- still JSONB - filters vary
FROM events
WHERE event_type = 'search';
-- Create a view that reassembles the original interface
CREATE VIEW all_events AS
SELECT event_id, customer_id, session_id, 'purchase' AS event_type, occurred_at
FROM purchase_events
UNION ALL
SELECT event_id, customer_id, session_id, 'search', occurred_at
FROM search_events;
Example 4: JSONB for configuration and feature flags
A production pattern: storing per-customer configuration and feature flags in JSONB, with a merge hierarchy (default → plan-level → customer-level overrides):
CREATE TABLE feature_config (
config_id SERIAL PRIMARY KEY,
scope TEXT NOT NULL, -- 'default', 'plan:basic', 'plan:pro', 'customer:123'
config JSONB NOT NULL
);
INSERT INTO feature_config (scope, config) VALUES
('default', '{"max_exports":10,"api_calls_per_day":1000,"analytics":false,"priority_support":false}'),
('plan:basic', '{"max_exports":50,"api_calls_per_day":5000}'),
('plan:pro', '{"max_exports":500,"api_calls_per_day":50000,"analytics":true,"priority_support":true}'),
('customer:1', '{"max_exports":1000}'); -- customer 1 has a custom export limit
-- Resolve effective config for customer 1 on plan 'pro' (merge hierarchy)
WITH configs AS (
SELECT config, CASE scope
WHEN 'default' THEN 1
WHEN 'plan:basic' THEN 2
WHEN 'plan:pro' THEN 2
ELSE 3 -- customer-specific overrides
END AS priority
FROM feature_config
WHERE scope IN ('default', 'plan:pro', 'customer:1')
),
merged AS (
SELECT config FROM configs ORDER BY priority
)
SELECT
-- Fold all configs together: each level overrides the previous
(SELECT jsonb_object_agg(key, value)
FROM (
SELECT key, value
FROM configs c, jsonb_each(c.config)
ORDER BY c.priority DESC -- higher priority wins (last write wins in jsonb_object_agg)
) final
) AS effective_config;
Key Takeaway
JSONB quality requires active management: CHECK constraints enforce structure, jsonb_object_keys audits what's actually stored, and the JSONB-to-relational migration pattern (add columns → populate from JSONB → verify → remove JSONB keys) is the clean path for promoting prototype data to a real schema. The configuration merge hierarchy pattern (default → plan → customer using ||) is a very common real-world application of JSONB's flexibility.