Working with JSONB Arrays — UNNEST, Aggregation, and Lateral Joins

Working with JSONB Arrays — UNNEST, Aggregation, and Lateral Joins

Arrays appear everywhere in JSONB data: product sizes, tags, category hierarchies, lists of user permissions, event sequences, recommendation lists. PostgreSQL provides two ways to work with them: expand elements into rows (jsonb_array_elements), or treat the array as a set for membership testing (@> containment). Understanding when to use each is the key skill.

JSONB arrays vs PostgreSQL native arrays

PostgreSQL has two distinct array systems:

  • JSONB arrays (jsonb): flexible, can hold mixed types, nested structures; accessed via ->, jsonb_array_elements
  • PostgreSQL native arrays (integer[], text[], etc.): typed, more efficient for computation; accessed via [n], UNNEST, ANY

In practice, if you're building new tables, use native arrays for typed data (IDs, tags). Use JSONB arrays when the array is part of a larger variable-schema document.


Example 1: Expanding JSONB arrays with jsonb_array_elements

jsonb_array_elements is a set-returning function — it takes one row containing a JSONB array and returns multiple rows, one per element. Use LATERAL to cross-apply it per row of the parent table.

-- Create a table with JSONB arrays for demonstration
CREATE TABLE IF NOT EXISTS articles (
    article_id  SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    tags        JSONB,   -- ["postgresql","analytics","sql"]
    authors     JSONB,   -- ["Alice Chen","Bob Smith"]
    published_at DATE
);

INSERT INTO articles (title, tags, authors, published_at) VALUES
  ('Window Functions Deep Dive', '["postgresql","analytics","windows"]', '["Alice Chen"]', '2024-01-15'),
  ('CTEs vs Subqueries', '["postgresql","performance","sql"]', '["Bob Smith","Carol Liu"]', '2024-02-20'),
  ('JSON in PostgreSQL', '["postgresql","jsonb","nosql"]', '["Alice Chen","Dave Park"]', '2024-03-10'),
  ('Index Optimization', '["postgresql","performance","indexes"]', '["Carol Liu"]', '2024-04-05'),
  ('Analytics Aggregations', '["sql","analytics","grouping-sets"]', '["Bob Smith"]', '2024-05-12');

-- Expand tags: one row per article-tag combination
SELECT
    a.article_id,
    a.title,
    tag.value #>> '{}' AS tag  -- #>> '{}' extracts scalar text from jsonb
FROM articles a,
     LATERAL jsonb_array_elements(a.tags) AS tag(value)
ORDER BY a.article_id, tag;

What This Returns

article_id title tag
1 Window Functions Deep Dive analytics
1 Window Functions Deep Dive postgresql
1 Window Functions Deep Dive windows

The LATERAL keyword is required because jsonb_array_elements references the a table from the outer query. Without LATERAL, PostgreSQL would not know which row to expand.


Example 2: Analytics on JSONB array data

Once expanded into rows, JSONB array data can be aggregated with standard SQL:

-- Tag frequency: which tags appear most across all articles?
SELECT
    tag.value #>> '{}' AS tag,
    COUNT(*) AS article_count,
    STRING_AGG(a.title, ', ' ORDER BY a.published_at) AS articles
FROM articles a,
     LATERAL jsonb_array_elements(a.tags) AS tag(value)
GROUP BY tag
ORDER BY article_count DESC;

-- Co-occurrence: which tags appear together most often?
WITH tag_pairs AS (
    SELECT
        a.article_id,
        t1.value #>> '{}' AS tag1,
        t2.value #>> '{}' AS tag2
    FROM articles a,
         LATERAL jsonb_array_elements(a.tags) AS t1(value),
         LATERAL jsonb_array_elements(a.tags) AS t2(value)
    WHERE t1.value < t2.value  -- avoid (a,b) and (b,a) duplicates and (a,a) self-pairs
)
SELECT tag1, tag2, COUNT(*) AS co_occurrences
FROM tag_pairs
GROUP BY tag1, tag2
ORDER BY co_occurrences DESC, tag1;

The self-join pattern LATERAL ... t1 CROSS JOIN LATERAL ... t2 on the same JSONB array produces all pairs from the array. The WHERE t1.value < t2.value filter ensures canonical ordering (smaller value first) to avoid counting (postgresql, analytics) and (analytics, postgresql) as separate pairs.


Example 3: UNNEST with ORDINALITY for positional access

UNNEST WITH ORDINALITY gives you the position (1-based index) of each element — crucial for sequence analysis:

-- Example: event sequence stored as a JSONB array
CREATE TABLE IF NOT EXISTS user_sessions (
    session_id TEXT PRIMARY KEY,
    customer_id INT,
    event_sequence JSONB  -- ["page_view","product_view","add_to_cart","checkout","purchase"]
);

INSERT INTO user_sessions VALUES
  ('sess_001', 1, '["page_view","product_view","add_to_cart","purchase"]'),
  ('sess_002', 2, '["page_view","page_view","product_view","checkout","purchase"]'),
  ('sess_003', 3, '["page_view","product_view","product_view","add_to_cart"]'),
  ('sess_004', 4, '["page_view","purchase"]')
ON CONFLICT (session_id) DO NOTHING;

-- Expand sequence with position
SELECT
    us.session_id,
    us.customer_id,
    step_num,
    event_type #>> '{}' AS event_type
FROM user_sessions us,
     LATERAL jsonb_array_elements(us.event_sequence)
         WITH ORDINALITY AS t(event_type, step_num)
ORDER BY us.session_id, step_num;
-- Find sessions where 'add_to_cart' happens before 'purchase'
WITH expanded AS (
    SELECT
        session_id,
        event_type #>> '{}' AS event_type,
        step_num
    FROM user_sessions,
         LATERAL jsonb_array_elements(event_sequence) WITH ORDINALITY AS t(event_type, step_num)
),
cart_and_purchase AS (
    SELECT
        session_id,
        MIN(step_num) FILTER (WHERE event_type = 'add_to_cart') AS cart_step,
        MIN(step_num) FILTER (WHERE event_type = 'purchase') AS purchase_step
    FROM expanded
    GROUP BY session_id
)
SELECT
    session_id,
    cart_step,
    purchase_step,
    CASE WHEN cart_step < purchase_step THEN 'Normal Path'
         WHEN cart_step IS NULL THEN 'Direct Purchase (No Cart)'
         WHEN purchase_step IS NULL THEN 'Abandoned'
         ELSE 'Unusual Order'
    END AS path_type
FROM cart_and_purchase;

Example 4: Native PostgreSQL arrays — UNNEST and ANY

For typed arrays (text[], integer[]), PostgreSQL native array operations are faster than JSONB:

-- Using native text arrays for tags (more efficient than JSONB arrays for simple lists)
ALTER TABLE articles ADD COLUMN IF NOT EXISTS tags_native TEXT[];
UPDATE articles SET tags_native = ARRAY(
    SELECT value #>> '{}' FROM jsonb_array_elements(tags)
);

-- Array membership check: articles tagged with 'postgresql'
SELECT title FROM articles
WHERE 'postgresql' = ANY(tags_native);

-- @> contains: articles with BOTH 'postgresql' AND 'analytics'
SELECT title FROM articles
WHERE tags_native @> ARRAY['postgresql', 'analytics'];

-- Overlap &&: articles with ANY of these tags
SELECT title FROM articles
WHERE tags_native && ARRAY['nosql', 'indexes', 'windows'];

-- Unnest native array (simpler than JSONB version)
SELECT a.title, UNNEST(a.tags_native) AS tag
FROM articles a
ORDER BY a.title, tag;

-- Array aggregation back from rows
SELECT STRING_AGG(title, ', ' ORDER BY published_at) AS titles,
       ARRAY_AGG(DISTINCT tag ORDER BY tag) AS all_tags
FROM articles a, UNNEST(a.tags_native) AS tag
GROUP BY tag
HAVING COUNT(*) >= 2;

Key Takeaway

LATERAL jsonb_array_elements() is the primary tool for expanding JSONB arrays into rows — essential for frequency analysis, co-occurrence, and sequence analysis. WITH ORDINALITY adds position numbers. For typed arrays (IDs, simple tags), native PostgreSQL arrays with UNNEST and ANY are simpler and faster. Choose JSONB arrays when the array is part of a larger variable-schema document; choose native arrays when the array is the main column concern.