STRING_AGG and ARRAY_AGG — Aggregating Into Collections
STRING_AGG and ARRAY_AGG — Aggregating Into Collections
Most aggregate functions collapse many rows into a single scalar number. STRING_AGG and ARRAY_AGG collapse many rows into a single collection — a delimited string or an array. This is fundamentally different and opens up a class of problems that would otherwise require multiple queries and application-side string building.
STRING_AGG
STRING_AGG(expression, delimiter ORDER BY ...) concatenates non-NULL values into a single string, separated by the delimiter. It's the SQL equivalent of Python's ', '.join(list). The ORDER BY clause controls the order of elements within the string — without it, the order is undefined.
ARRAY_AGG
ARRAY_AGG(expression ORDER BY ...) collects values into a PostgreSQL array. Arrays can be indexed, sliced, passed to unnest(), and checked with the @> (contains) operator. ARRAY_AGG is more powerful than STRING_AGG for programmatic processing; STRING_AGG is more useful for display.
Example 1: Building a product list per order
The most common STRING_AGG use case: each order has multiple items, but you want a single-row summary per order with items listed as a comma-separated string — perfect for email receipts, order confirmations, or reporting dashboards.
SELECT
o.order_id,
DATE_TRUNC('day', o.created_at)::DATE,
o.total_amount,
STRING_AGG(p.name, ', ' ORDER BY p.category, p.name) AS items,
STRING_AGG(p.category || ': ' || p.name || ' (×' || oi.quantity || ')',
' | ' ORDER BY p.category) AS detailed_items,
COUNT(DISTINCT oi.product_id) AS unique_products,
SUM(oi.quantity) AS total_units
FROM ec_orders o
JOIN ec_order_items oi ON oi.order_id = o.order_id
JOIN ec_products p ON p.product_id = oi.product_id
WHERE o.status = 'completed'
GROUP BY o.order_id, DATE_TRUNC('day', o.created_at)::DATE, o.total_amount
ORDER BY DATE_TRUNC('day', o.created_at)::DATE DESC
LIMIT 10;
What This Returns
| order_id | order_date | total_amount | items | detailed_items |
|---|---|---|---|---|
| 4821 | 2025-03-15 | 342.00 | Laptop Stand, USB Hub, Wireless Mouse | Electronics: Laptop Stand (×1) | Electronics: USB Hub (×2) | Electronics: Wireless Mouse (×1) |
The ORDER BY inside STRING_AGG ensures alphabetical ordering within each order's item list. Without it, the same query run twice could return items in different orders, making output non-deterministic.
Example 2: ARRAY_AGG for invoice status history
ARRAY_AGG is better than STRING_AGG when you need to work with the collected values programmatically — checking membership, slicing, or unnesting:
SELECT
c.company_name,
ARRAY_AGG(DISTINCT i.status ORDER BY i.status) AS statuses_seen,
ARRAY_AGG(i.invoice_no ORDER BY i.issue_date) AS invoice_timeline,
ARRAY_AGG(DISTINCT p.method ORDER BY p.method) AS methods_used,
-- Has this client ever been overdue?
'overdue' = ANY(ARRAY_AGG(DISTINCT i.status)) AS ever_overdue,
-- Was their most recent invoice paid?
(ARRAY_AGG(i.status ORDER BY i.issue_date DESC))[1] AS latest_invoice_status
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
LEFT JOIN payments p ON p.invoice_id = i.invoice_id
GROUP BY c.client_id, c.company_name
ORDER BY c.company_name;
What This Returns
| company_name | statuses_seen | invoice_timeline | ever_overdue | latest_invoice_status |
|---|---|---|---|---|
| Acme Corp | {overdue,paid,sent} | {INV-00001,INV-00008,...} | true | paid |
Key techniques:
ARRAY_AGG(DISTINCT ...)deduplicates before aggregating'overdue' = ANY(array)checks array membership — equivalent toINbut for an array(ARRAY_AGG(...ORDER BY ...))[1]extracts the first element — getting the most recent/earliest item
Example 3: Detecting patterns with ARRAY_AGG + unnest
A powerful advanced pattern: aggregate values into an array, then unnest to analyse sequences. This example finds clients whose payment sequence shows a declining pattern (later invoices taking longer to pay):
WITH payment_lags AS (
SELECT
c.client_id,
c.company_name,
ARRAY_AGG(
p.payment_date - i.issue_date
ORDER BY i.issue_date
) AS lag_sequence
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
JOIN payments p ON p.invoice_id = i.invoice_id
GROUP BY c.client_id, c.company_name
HAVING COUNT(*) >= 4
),
-- Unnest and re-number to analyse the sequence
unnested AS (
SELECT
client_id, company_name,
ordinality AS invoice_num,
lag_days
FROM payment_lags,
LATERAL UNNEST(lag_sequence) WITH ORDINALITY AS t(lag_days, ordinality)
),
-- Linear regression of lag_days vs invoice_num (is DSO trending up?)
trend AS (
SELECT
client_id, company_name,
COUNT(*) AS data_points,
ROUND(REGR_SLOPE(lag_days, invoice_num)::NUMERIC, 2) AS dso_trend_per_invoice,
ROUND(CORR(lag_days, invoice_num)::NUMERIC, 3) AS trend_correlation
FROM unnested
GROUP BY client_id, company_name
)
SELECT *
FROM trend
WHERE dso_trend_per_invoice > 2 -- DSO increasing by more than 2 days per invoice on average
AND trend_correlation > 0.6 -- and the trend is reasonably strong
ORDER BY dso_trend_per_invoice DESC;
This combines ARRAY_AGG → UNNEST WITH ORDINALITY → REGR_SLOPE into a single analysis pipeline entirely in SQL.
Example 4: Building JSON-like structures with STRING_AGG
STRING_AGG can construct structured output for APIs or reporting systems — building comma-separated KV pairs, SQL IN-lists, or even JSON-like text:
-- Build a per-client summary as a formatted text block
SELECT
c.company_name,
'Industry: ' || c.industry ||
' | Invoices: ' || COUNT(DISTINCT i.invoice_id) ||
' | Products: ' || STRING_AGG(DISTINCT p.category, ', ' ORDER BY p.category) ||
' | Methods: ' || STRING_AGG(DISTINCT pay.method, '/' ORDER BY pay.method)
AS client_profile
FROM clients c
JOIN invoices i ON i.client_id = c.client_id
JOIN invoice_items ii ON ii.invoice_id = i.invoice_id
LEFT JOIN payments pay ON pay.invoice_id = i.invoice_id
LEFT JOIN ec_products p ON p.name ILIKE '%' || ii.description || '%' -- loose match
GROUP BY c.client_id, c.company_name, c.industry
ORDER BY c.company_name;
-- Build an IN list for use in another query (useful for dynamic SQL generation)
SELECT
'SELECT * FROM clients WHERE industry IN (' ||
STRING_AGG(DISTINCT industry, ', ' ORDER BY industry) ||
');' AS dynamic_query
FROM clients
WHERE country = 'US';
Key Takeaway
STRING_AGG and ARRAY_AGG shift the aggregation paradigm from "collapse to a number" to "collapse to a collection". Use STRING_AGG for display output and ARRAY_AGG when you need to inspect, index, or process the collected values. The combination of ARRAY_AGG → UNNEST WITH ORDINALITY is one of the most powerful patterns in PostgreSQL for analysing ordered sequences without PL/pgSQL.