STRING_AGG and ARRAY_AGG — Aggregating Into Collections

STRING_AGG and ARRAY_AGG — Aggregating Into Collections

Most aggregate functions collapse many rows into a single scalar number. STRING_AGG and ARRAY_AGG collapse many rows into a single collection — a delimited string or an array. This is fundamentally different and opens up a class of problems that would otherwise require multiple queries and application-side string building.

STRING_AGG

STRING_AGG(expression, delimiter ORDER BY ...) concatenates non-NULL values into a single string, separated by the delimiter. It's the SQL equivalent of Python's ', '.join(list). The ORDER BY clause controls the order of elements within the string — without it, the order is undefined.

ARRAY_AGG

ARRAY_AGG(expression ORDER BY ...) collects values into a PostgreSQL array. Arrays can be indexed, sliced, passed to unnest(), and checked with the @> (contains) operator. ARRAY_AGG is more powerful than STRING_AGG for programmatic processing; STRING_AGG is more useful for display.


Example 1: Building a product list per order

The most common STRING_AGG use case: each order has multiple items, but you want a single-row summary per order with items listed as a comma-separated string — perfect for email receipts, order confirmations, or reporting dashboards.

SELECT
    o.order_id,
    DATE_TRUNC('day', o.created_at)::DATE,
    o.total_amount,
    STRING_AGG(p.name, ', ' ORDER BY p.category, p.name) AS items,
    STRING_AGG(p.category || ': ' || p.name || ' (×' || oi.quantity || ')',
               ' | ' ORDER BY p.category) AS detailed_items,
    COUNT(DISTINCT oi.product_id) AS unique_products,
    SUM(oi.quantity) AS total_units
FROM ec_orders o
JOIN ec_order_items oi ON oi.order_id = o.order_id
JOIN ec_products p ON p.product_id = oi.product_id
WHERE o.status = 'completed'
GROUP BY o.order_id, DATE_TRUNC('day', o.created_at)::DATE, o.total_amount
ORDER BY DATE_TRUNC('day', o.created_at)::DATE DESC
LIMIT 10;

What This Returns

order_id order_date total_amount items detailed_items
4821 2025-03-15 342.00 Laptop Stand, USB Hub, Wireless Mouse Electronics: Laptop Stand (×1) | Electronics: USB Hub (×2) | Electronics: Wireless Mouse (×1)

The ORDER BY inside STRING_AGG ensures alphabetical ordering within each order's item list. Without it, the same query run twice could return items in different orders, making output non-deterministic.


Example 2: ARRAY_AGG for invoice status history

ARRAY_AGG is better than STRING_AGG when you need to work with the collected values programmatically — checking membership, slicing, or unnesting:

SELECT
    c.company_name,
    ARRAY_AGG(DISTINCT i.status ORDER BY i.status) AS statuses_seen,
    ARRAY_AGG(i.invoice_no ORDER BY i.issue_date) AS invoice_timeline,
    ARRAY_AGG(DISTINCT p.method ORDER BY p.method) AS methods_used,
    -- Has this client ever been overdue?
    'overdue' = ANY(ARRAY_AGG(DISTINCT i.status)) AS ever_overdue,
    -- Was their most recent invoice paid?
    (ARRAY_AGG(i.status ORDER BY i.issue_date DESC))[1] AS latest_invoice_status
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
LEFT JOIN payments p ON p.invoice_id = i.invoice_id
GROUP BY c.client_id, c.company_name
ORDER BY c.company_name;

What This Returns

company_name statuses_seen invoice_timeline ever_overdue latest_invoice_status
Acme Corp {overdue,paid,sent} {INV-00001,INV-00008,...} true paid

Key techniques:

  • ARRAY_AGG(DISTINCT ...) deduplicates before aggregating
  • 'overdue' = ANY(array) checks array membership — equivalent to IN but for an array
  • (ARRAY_AGG(...ORDER BY ...))[1] extracts the first element — getting the most recent/earliest item

Example 3: Detecting patterns with ARRAY_AGG + unnest

A powerful advanced pattern: aggregate values into an array, then unnest to analyse sequences. This example finds clients whose payment sequence shows a declining pattern (later invoices taking longer to pay):

WITH payment_lags AS (
    SELECT
        c.client_id,
        c.company_name,
        ARRAY_AGG(
            p.payment_date - i.issue_date
            ORDER BY i.issue_date
        ) AS lag_sequence
    FROM clients c
    JOIN invoices i ON i.client_id = c.client_id
    JOIN payments p ON p.invoice_id = i.invoice_id
    GROUP BY c.client_id, c.company_name
    HAVING COUNT(*) >= 4
),
-- Unnest and re-number to analyse the sequence
unnested AS (
    SELECT
        client_id, company_name,
        ordinality AS invoice_num,
        lag_days
    FROM payment_lags,
        LATERAL UNNEST(lag_sequence) WITH ORDINALITY AS t(lag_days, ordinality)
),
-- Linear regression of lag_days vs invoice_num (is DSO trending up?)
trend AS (
    SELECT
        client_id, company_name,
        COUNT(*) AS data_points,
        ROUND(REGR_SLOPE(lag_days, invoice_num)::NUMERIC, 2) AS dso_trend_per_invoice,
        ROUND(CORR(lag_days, invoice_num)::NUMERIC, 3) AS trend_correlation
    FROM unnested
    GROUP BY client_id, company_name
)
SELECT *
FROM trend
WHERE dso_trend_per_invoice > 2  -- DSO increasing by more than 2 days per invoice on average
  AND trend_correlation > 0.6    -- and the trend is reasonably strong
ORDER BY dso_trend_per_invoice DESC;

This combines ARRAY_AGGUNNEST WITH ORDINALITYREGR_SLOPE into a single analysis pipeline entirely in SQL.


Example 4: Building JSON-like structures with STRING_AGG

STRING_AGG can construct structured output for APIs or reporting systems — building comma-separated KV pairs, SQL IN-lists, or even JSON-like text:

-- Build a per-client summary as a formatted text block
SELECT
    c.company_name,
    'Industry: ' || c.industry ||
    ' | Invoices: ' || COUNT(DISTINCT i.invoice_id) ||
    ' | Products: ' || STRING_AGG(DISTINCT p.category, ', ' ORDER BY p.category) ||
    ' | Methods: ' || STRING_AGG(DISTINCT pay.method, '/' ORDER BY pay.method)
    AS client_profile
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
LEFT JOIN payments pay ON pay.invoice_id = i.invoice_id
LEFT JOIN ec_products p ON p.name ILIKE '%' || ii.description || '%'  -- loose match
GROUP BY c.client_id, c.company_name, c.industry
ORDER BY c.company_name;
-- Build an IN list for use in another query (useful for dynamic SQL generation)
SELECT
    'SELECT * FROM clients WHERE industry IN (' ||
    STRING_AGG(DISTINCT industry, ', ' ORDER BY industry) ||
    ');' AS dynamic_query
FROM clients
WHERE country = 'US';

Key Takeaway

STRING_AGG and ARRAY_AGG shift the aggregation paradigm from "collapse to a number" to "collapse to a collection". Use STRING_AGG for display output and ARRAY_AGG when you need to inspect, index, or process the collected values. The combination of ARRAY_AGGUNNEST WITH ORDINALITY is one of the most powerful patterns in PostgreSQL for analysing ordered sequences without PL/pgSQL.