Querying and Filtering JSONB — Operators, Paths, and Indexes

Querying and Filtering JSONB — Operators, Paths, and Indexes

The power of JSONB comes from being able to query inside documents efficiently. Without indexes, every JSONB query requires a full sequential scan — examining every document on every row. With a GIN index, JSONB queries can use the index to find matching documents in milliseconds even across millions of rows.

The GIN index for JSONB

A GIN (Generalized Inverted Index) on a JSONB column indexes every key and value in every document. It makes the containment operator @> and key-existence operators ?, ?&, ?| index-accelerated.

-- Create a GIN index on the attributes column
CREATE INDEX IF NOT EXISTS idx_product_attributes ON product_catalog USING GIN (attributes);

-- Now these queries use the index:
SELECT * FROM product_catalog WHERE attributes @> '{"brand": "Apple"}';
SELECT * FROM product_catalog WHERE attributes ? 'author';

-- For path-specific queries on frequently accessed keys, use a functional index:
CREATE INDEX IF NOT EXISTS idx_product_brand ON product_catalog
    USING BTREE ((attributes ->> 'brand'));

-- Now this query uses the btree index (faster for equality/range):
SELECT * FROM product_catalog WHERE attributes ->> 'brand' = 'Apple';

The GIN index covers @> and ? operators. For equality and range comparisons on a specific extracted field, a BTREE functional index on (attributes ->> 'key') is faster.


Example 1: Complex JSONB filter conditions

Real queries combine multiple JSONB conditions with regular column filters:

-- Find all electronics with at least 16GB RAM AND under 1.5kg AND under £2000
SELECT
    name,
    base_price,
    attributes ->> 'brand' AS brand,
    (attributes ->> 'ram_gb')::INT AS ram_gb,
    (attributes ->> 'weight_kg')::NUMERIC AS weight_kg
FROM product_catalog
WHERE
    category = 'Electronics'
    AND (attributes ->> 'ram_gb')::INT >= 16
    AND (attributes ->> 'weight_kg')::NUMERIC < 1.5
    AND base_price < 2000
ORDER BY (attributes ->> 'ram_gb')::INT DESC, base_price;
-- Find all products where the brand is any of a set of values
SELECT name, category, attributes ->> 'brand' AS brand
FROM product_catalog
WHERE attributes ->> 'brand' = ANY(ARRAY['Apple', 'Lenovo', 'Dell'])
ORDER BY category, name;
-- Containment check for complex sub-documents
-- Find products whose attributes contain both a specific brand AND a specific color
SELECT name
FROM product_catalog
WHERE attributes @> '{"brand": "Apple", "color": "silver"}';

Example 2: Querying arrays inside JSONB

When a JSONB value is an array, you can check if the array contains a specific element using containment:

-- Find clothing products available in size 'L'
-- The array ["S","M","L","XL"] contains the string "L"
SELECT name, attributes -> 'sizes' AS available_sizes
FROM product_catalog
WHERE attributes @> '{"sizes": ["L"]}';

-- This is the key insight: @> checks deep containment
-- {"sizes": ["S","M","L"]} @> {"sizes": ["L"]} is TRUE

-- Get products available in both 'S' AND 'XL'
SELECT name
FROM product_catalog
WHERE attributes @> '{"sizes": ["S", "XL"]}';
-- Expand JSONB array to rows using jsonb_array_elements
SELECT
    pc.name,
    size.value #>> '{}' AS available_size  -- #>> '{}' extracts the text value of a scalar
FROM product_catalog pc,
     LATERAL jsonb_array_elements(pc.attributes -> 'sizes') AS size(value)
WHERE pc.category = 'Clothing'
ORDER BY pc.name, available_size;

The LATERAL jsonb_array_elements(...) pattern is the standard way to "explode" a JSONB array into rows — one row per array element, cross-joined with the parent row.


Example 3: JSONPath queries — flexible pattern matching

PostgreSQL 12+ supports the SQL/JSON path language, which provides more expressive querying than the -> operators:

-- jsonb_path_query: returns JSONB matching a path expression
-- Find all products where any attribute value contains "Apple"
SELECT name, jsonb_path_query_first(attributes, '$.brand') AS brand
FROM product_catalog
WHERE jsonb_path_exists(attributes, '$.brand ? (@ == "Apple")');

-- Find electronics where ram_gb is between 16 and 32
SELECT name, attributes ->> 'ram_gb' AS ram
FROM product_catalog
WHERE jsonb_path_exists(attributes, '$.ram_gb ? (@ >= 16 && @ <= 32)');

-- Extract all numeric values from attributes
SELECT name,
    jsonb_path_query_array(attributes, '$.*  ? (@.type() == "number")') AS numeric_values
FROM product_catalog;

-- Find products with any size starting with 'X'
SELECT name
FROM product_catalog
WHERE jsonb_path_exists(
    attributes,
    '$.sizes[*] ? (@ starts with "X")'
);

JSONPath syntax: $ = root document, .key = field access, [*] = all array elements, ? (condition) = filter.


Example 4: Performance comparison — indexed vs non-indexed JSONB

Understanding when queries use the GIN index vs fall back to sequential scan:

-- EXPLAIN to see index usage
EXPLAIN (ANALYZE, BUFFERS)
SELECT name FROM product_catalog
WHERE attributes @> '{"brand": "Apple"}';
-- With GIN index: "Bitmap Index Scan on idx_product_attributes"
-- Without index: "Seq Scan on product_catalog"

-- Queries that USE the GIN index:
-- attributes @> '{"key": "value"}'    ← containment
-- attributes ? 'key'                  ← key existence
-- attributes ?& ARRAY['k1','k2']      ← all keys exist
-- attributes ?| ARRAY['k1','k2']      ← any key exists

-- Queries that do NOT use the GIN index (need BTREE functional index):
-- attributes ->> 'key' = 'value'      ← equality on extracted text
-- attributes ->> 'key' > '100'        ← range on extracted text
-- (attributes ->> 'key')::INT > 100   ← cast + comparison

-- Best practice: use @> for GIN-accelerated queries when checking exact values
-- Use functional BTREE indexes for frequently-filtered extracted fields
-- Create targeted functional indexes for hot query patterns
CREATE INDEX IF NOT EXISTS idx_product_brand_btree
    ON product_catalog USING BTREE ((attributes ->> 'brand'));

CREATE INDEX IF NOT EXISTS idx_product_ram
    ON product_catalog USING BTREE (((attributes ->> 'ram_gb')::INT))
    WHERE category = 'Electronics';  -- partial index: only for electronics rows

The partial index WHERE category = 'Electronics' is both smaller and faster — it only indexes the rows where ram_gb is meaningful, reducing both index size and maintenance overhead.

Key Takeaway

GIN indexes make @> and ? operators fast. For equality and range filters on specific keys, add a functional BTREE index on (column ->> 'key'). Use LATERAL jsonb_array_elements() to explode arrays into rows. JSONPath (PostgreSQL 12+) provides regex-like matching inside documents. Always EXPLAIN your JSONB queries — a sequential scan on a large JSONB table is often catastrophic for performance.