Working with JSONB Arrays — UNNEST, Aggregation, and Lateral Joins
Working with JSONB Arrays — UNNEST, Aggregation, and Lateral Joins
Arrays appear everywhere in JSONB data: product sizes, tags, category hierarchies, lists of user permissions, event sequences, recommendation lists. PostgreSQL provides two ways to work with them: expand elements into rows (jsonb_array_elements), or treat the array as a set for membership testing (@> containment). Understanding when to use each is the key skill.
JSONB arrays vs PostgreSQL native arrays
PostgreSQL has two distinct array systems:
- JSONB arrays (
jsonb): flexible, can hold mixed types, nested structures; accessed via->,jsonb_array_elements - PostgreSQL native arrays (
integer[],text[], etc.): typed, more efficient for computation; accessed via[n],UNNEST,ANY
In practice, if you're building new tables, use native arrays for typed data (IDs, tags). Use JSONB arrays when the array is part of a larger variable-schema document.
Example 1: Expanding JSONB arrays with jsonb_array_elements
jsonb_array_elements is a set-returning function — it takes one row containing a JSONB array and returns multiple rows, one per element. Use LATERAL to cross-apply it per row of the parent table.
-- Create a table with JSONB arrays for demonstration
CREATE TABLE IF NOT EXISTS articles (
article_id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
tags JSONB, -- ["postgresql","analytics","sql"]
authors JSONB, -- ["Alice Chen","Bob Smith"]
published_at DATE
);
INSERT INTO articles (title, tags, authors, published_at) VALUES
('Window Functions Deep Dive', '["postgresql","analytics","windows"]', '["Alice Chen"]', '2024-01-15'),
('CTEs vs Subqueries', '["postgresql","performance","sql"]', '["Bob Smith","Carol Liu"]', '2024-02-20'),
('JSON in PostgreSQL', '["postgresql","jsonb","nosql"]', '["Alice Chen","Dave Park"]', '2024-03-10'),
('Index Optimization', '["postgresql","performance","indexes"]', '["Carol Liu"]', '2024-04-05'),
('Analytics Aggregations', '["sql","analytics","grouping-sets"]', '["Bob Smith"]', '2024-05-12');
-- Expand tags: one row per article-tag combination
SELECT
a.article_id,
a.title,
tag.value #>> '{}' AS tag -- #>> '{}' extracts scalar text from jsonb
FROM articles a,
LATERAL jsonb_array_elements(a.tags) AS tag(value)
ORDER BY a.article_id, tag;
What This Returns
| article_id | title | tag |
|---|---|---|
| 1 | Window Functions Deep Dive | analytics |
| 1 | Window Functions Deep Dive | postgresql |
| 1 | Window Functions Deep Dive | windows |
The LATERAL keyword is required because jsonb_array_elements references the a table from the outer query. Without LATERAL, PostgreSQL would not know which row to expand.
Example 2: Analytics on JSONB array data
Once expanded into rows, JSONB array data can be aggregated with standard SQL:
-- Tag frequency: which tags appear most across all articles?
SELECT
tag.value #>> '{}' AS tag,
COUNT(*) AS article_count,
STRING_AGG(a.title, ', ' ORDER BY a.published_at) AS articles
FROM articles a,
LATERAL jsonb_array_elements(a.tags) AS tag(value)
GROUP BY tag
ORDER BY article_count DESC;
-- Co-occurrence: which tags appear together most often?
WITH tag_pairs AS (
SELECT
a.article_id,
t1.value #>> '{}' AS tag1,
t2.value #>> '{}' AS tag2
FROM articles a,
LATERAL jsonb_array_elements(a.tags) AS t1(value),
LATERAL jsonb_array_elements(a.tags) AS t2(value)
WHERE t1.value < t2.value -- avoid (a,b) and (b,a) duplicates and (a,a) self-pairs
)
SELECT tag1, tag2, COUNT(*) AS co_occurrences
FROM tag_pairs
GROUP BY tag1, tag2
ORDER BY co_occurrences DESC, tag1;
The self-join pattern LATERAL ... t1 CROSS JOIN LATERAL ... t2 on the same JSONB array produces all pairs from the array. The WHERE t1.value < t2.value filter ensures canonical ordering (smaller value first) to avoid counting (postgresql, analytics) and (analytics, postgresql) as separate pairs.
Example 3: UNNEST with ORDINALITY for positional access
UNNEST WITH ORDINALITY gives you the position (1-based index) of each element — crucial for sequence analysis:
-- Example: event sequence stored as a JSONB array
CREATE TABLE IF NOT EXISTS user_sessions (
session_id TEXT PRIMARY KEY,
customer_id INT,
event_sequence JSONB -- ["page_view","product_view","add_to_cart","checkout","purchase"]
);
INSERT INTO user_sessions VALUES
('sess_001', 1, '["page_view","product_view","add_to_cart","purchase"]'),
('sess_002', 2, '["page_view","page_view","product_view","checkout","purchase"]'),
('sess_003', 3, '["page_view","product_view","product_view","add_to_cart"]'),
('sess_004', 4, '["page_view","purchase"]')
ON CONFLICT (session_id) DO NOTHING;
-- Expand sequence with position
SELECT
us.session_id,
us.customer_id,
step_num,
event_type #>> '{}' AS event_type
FROM user_sessions us,
LATERAL jsonb_array_elements(us.event_sequence)
WITH ORDINALITY AS t(event_type, step_num)
ORDER BY us.session_id, step_num;
-- Find sessions where 'add_to_cart' happens before 'purchase'
WITH expanded AS (
SELECT
session_id,
event_type #>> '{}' AS event_type,
step_num
FROM user_sessions,
LATERAL jsonb_array_elements(event_sequence) WITH ORDINALITY AS t(event_type, step_num)
),
cart_and_purchase AS (
SELECT
session_id,
MIN(step_num) FILTER (WHERE event_type = 'add_to_cart') AS cart_step,
MIN(step_num) FILTER (WHERE event_type = 'purchase') AS purchase_step
FROM expanded
GROUP BY session_id
)
SELECT
session_id,
cart_step,
purchase_step,
CASE WHEN cart_step < purchase_step THEN 'Normal Path'
WHEN cart_step IS NULL THEN 'Direct Purchase (No Cart)'
WHEN purchase_step IS NULL THEN 'Abandoned'
ELSE 'Unusual Order'
END AS path_type
FROM cart_and_purchase;
Example 4: Native PostgreSQL arrays — UNNEST and ANY
For typed arrays (text[], integer[]), PostgreSQL native array operations are faster than JSONB:
-- Using native text arrays for tags (more efficient than JSONB arrays for simple lists)
ALTER TABLE articles ADD COLUMN IF NOT EXISTS tags_native TEXT[];
UPDATE articles SET tags_native = ARRAY(
SELECT value #>> '{}' FROM jsonb_array_elements(tags)
);
-- Array membership check: articles tagged with 'postgresql'
SELECT title FROM articles
WHERE 'postgresql' = ANY(tags_native);
-- @> contains: articles with BOTH 'postgresql' AND 'analytics'
SELECT title FROM articles
WHERE tags_native @> ARRAY['postgresql', 'analytics'];
-- Overlap &&: articles with ANY of these tags
SELECT title FROM articles
WHERE tags_native && ARRAY['nosql', 'indexes', 'windows'];
-- Unnest native array (simpler than JSONB version)
SELECT a.title, UNNEST(a.tags_native) AS tag
FROM articles a
ORDER BY a.title, tag;
-- Array aggregation back from rows
SELECT STRING_AGG(title, ', ' ORDER BY published_at) AS titles,
ARRAY_AGG(DISTINCT tag ORDER BY tag) AS all_tags
FROM articles a, UNNEST(a.tags_native) AS tag
GROUP BY tag
HAVING COUNT(*) >= 2;
Key Takeaway
LATERAL jsonb_array_elements() is the primary tool for expanding JSONB arrays into rows — essential for frequency analysis, co-occurrence, and sequence analysis. WITH ORDINALITY adds position numbers. For typed arrays (IDs, simple tags), native PostgreSQL arrays with UNNEST and ANY are simpler and faster. Choose JSONB arrays when the array is part of a larger variable-schema document; choose native arrays when the array is the main column concern.