Advanced JSONB Patterns — Schema Validation, Transformation, and Migration

Advanced JSONB Patterns — Schema Validation, Transformation, and Migration

JSONB's flexibility is its strength and its risk. Without discipline, JSONB columns become a dumping ground for inconsistently structured data that becomes increasingly difficult to query over time. This lesson covers patterns for maintaining JSONB quality: schema validation with CHECK constraints, transforming JSONB to relational form, and migrating JSONB columns to structured schemas.


Example 1: JSONB CHECK constraints for schema validation

PostgreSQL CHECK constraints run on INSERT and UPDATE — you can use them to enforce a minimum structure on JSONB columns:

ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid; ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;

-- Enforce that 'attributes' always has required keys and correct value types
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;
ALTER TABLE product_catalog ADD CONSTRAINT chk_attributes_required
    CHECK (
        -- Must have a 'brand' key with a string value
        jsonb_typeof(attributes -> 'brand') = 'string'
        -- Electronics must have 'ram_gb' as a number
        AND (category != 'Electronics' OR jsonb_typeof(attributes -> 'ram_gb') = 'number')
        -- No null values allowed at top level (use key omission instead)
        AND NOT (attributes @> '{"brand": null}'::JSONB)
    ) NOT VALID;

-- A more comprehensive constraint using a function
CREATE OR REPLACE FUNCTION validate_product_attributes(
    cat TEXT, attrs JSONB
) RETURNS BOOLEAN AS $$
BEGIN
    -- All products must have a brand
    IF attrs ->> 'brand' IS NULL THEN RETURN FALSE; END IF;

    -- Category-specific requirements
    IF cat = 'Electronics' THEN
        IF (attrs ->> 'ram_gb') IS NULL AND (attrs ->> 'storage_gb') IS NULL THEN
            RETURN FALSE;
        END IF;
    END IF;

    IF cat = 'Books' THEN
        IF attrs ->> 'author' IS NULL THEN RETURN FALSE; END IF;
    END IF;

    RETURN TRUE;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid;
ALTER TABLE product_catalog ADD CONSTRAINT chk_attrs_valid
    CHECK (validate_product_attributes(category, attributes)) NOT VALID;

Example 2: Transforming JSONB to relational columns

A common migration pattern: a JSONB column was used during prototyping, and now the data needs to be promoted to proper columns. The key is that JSONB-to-relational migration can be done entirely in SQL:

-- Drop the validation constraint added in the previous block — the
-- migration we are about to perform restructures the data and would
-- otherwise trip the existing CHECK while in flight.
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attrs_valid;
ALTER TABLE product_catalog DROP CONSTRAINT IF EXISTS chk_attributes_required;

-- Step 1: Audit what data exists in the JSONB column
SELECT
    key,
    COUNT(*) AS rows_with_key,
    (SELECT jsonb_typeof(attributes -> key)
     FROM product_catalog
     WHERE attributes ? key
     LIMIT 1) AS sample_type
FROM product_catalog,
     LATERAL jsonb_object_keys(attributes) AS key
GROUP BY key
ORDER BY rows_with_key DESC;

-- Step 2: Add the new columns
ALTER TABLE product_catalog
    ADD COLUMN brand TEXT,
    ADD COLUMN weight_kg NUMERIC(5,2),
    ADD COLUMN color TEXT;

-- Step 3: Populate from JSONB
UPDATE product_catalog SET
    brand     = attributes ->> 'brand',
    weight_kg = (attributes ->> 'weight_kg')::NUMERIC,
    color     = attributes ->> 'color';

-- Step 4: Verify migration (should be 0 discrepancies)
SELECT COUNT(*) AS discrepancies
FROM product_catalog
WHERE brand IS DISTINCT FROM (attributes ->> 'brand')
   OR weight_kg IS DISTINCT FROM (attributes ->> 'weight_kg')::NUMERIC;

-- Step 5: Remove migrated keys from JSONB (clean up)
UPDATE product_catalog
SET attributes = attributes - ARRAY['brand', 'weight_kg', 'color'];

-- Step 6: Add NOT NULL once every row has a brand value (omitted here:
-- the demo dataset has rows without a brand attribute, so we skip the
-- ALTER COLUMN brand SET NOT NULL step. Run it after backfilling missing rows.)

Example 3: JSONB normalisation — splitting a heterogeneous JSONB column

When a JSONB column stores fundamentally different structures depending on a type discriminator, consider splitting it into a proper polymorphic schema:

-- ❌
-- Original: one events table with heterogeneous payload
-- After analysis, split into type-specific tables

-- Extract purchase events to their own table
CREATE TABLE purchase_events AS
SELECT
    event_id,
    customer_id,
    session_id,
    occurred_at,
    (payload ->> 'order_id')::INT AS order_id,
    (payload ->> 'amount')::NUMERIC AS amount,
    payload ->> 'method' AS method,
    payload ->> 'promo_code' AS promo_code
FROM events_log
WHERE event_type = 'purchase';

-- Extract search events
CREATE TABLE search_events AS
SELECT
    event_id,
    customer_id,
    session_id,
    occurred_at,
    payload ->> 'query' AS search_query,
    (payload ->> 'results_count')::INT AS results_count,
    payload -> 'filters' AS filters_jsonb  -- still JSONB - filters vary
FROM events
WHERE event_type = 'search';

-- Create a view that reassembles the original interface
CREATE VIEW all_events AS
    SELECT event_id, customer_id, session_id, 'purchase' AS event_type, occurred_at
    FROM purchase_events
    UNION ALL
    SELECT event_id, customer_id, session_id, 'search', occurred_at
    FROM search_events;

Example 4: JSONB for configuration and feature flags

A production pattern: storing per-customer configuration and feature flags in JSONB, with a merge hierarchy (default → plan-level → customer-level overrides):

CREATE TABLE feature_config (
    config_id   SERIAL PRIMARY KEY,
    scope       TEXT NOT NULL,  -- 'default', 'plan:basic', 'plan:pro', 'customer:123'
    config      JSONB NOT NULL
);

INSERT INTO feature_config (scope, config) VALUES
  ('default',        '{"max_exports":10,"api_calls_per_day":1000,"analytics":false,"priority_support":false}'),
  ('plan:basic',     '{"max_exports":50,"api_calls_per_day":5000}'),
  ('plan:pro',       '{"max_exports":500,"api_calls_per_day":50000,"analytics":true,"priority_support":true}'),
  ('customer:1',     '{"max_exports":1000}');  -- customer 1 has a custom export limit

-- Resolve effective config for customer 1 on plan 'pro' (merge hierarchy)
WITH configs AS (
    SELECT config, CASE scope
        WHEN 'default' THEN 1
        WHEN 'plan:basic' THEN 2
        WHEN 'plan:pro' THEN 2
        ELSE 3  -- customer-specific overrides
    END AS priority
    FROM feature_config
    WHERE scope IN ('default', 'plan:pro', 'customer:1')
),
merged AS (
    SELECT config FROM configs ORDER BY priority
)
SELECT
    -- Fold all configs together: each level overrides the previous
    (SELECT jsonb_object_agg(key, value)
     FROM (
         SELECT key, value
         FROM configs c, jsonb_each(c.config)
         ORDER BY c.priority DESC  -- higher priority wins (last write wins in jsonb_object_agg)
     ) final
    ) AS effective_config;

Key Takeaway

JSONB quality requires active management: CHECK constraints enforce structure, jsonb_object_keys audits what's actually stored, and the JSONB-to-relational migration pattern (add columns → populate from JSONB → verify → remove JSONB keys) is the clean path for promoting prototype data to a real schema. The configuration merge hierarchy pattern (default → plan → customer using ||) is a very common real-world application of JSONB's flexibility.