Sessionization — Grouping Events into Sessions
Sessionization is the process of grouping a stream of timestamped user events into discrete "sessions" — periods of activity separated by gaps of inactivity. It's fundamental to web analytics, product analytics, and any system where user behaviour is tracked as events.
The standard definition: a session ends when a user is inactive for more than N minutes (commonly 30). Every event that falls within N minutes of the previous event belongs to the same session.
The data
The ShopMetrics events table tracks customer behaviour:
SELECT * FROM events ORDER BY customer_id, occurred_at LIMIT 10;
event_id | customer_id | event_type | page | occurred_at
----------+-------------+--------------+------------+----------------------------
1 | 1 | view | /products | 2025-10-01 09:00:00+00
2 | 1 | view | /product/3 | 2025-10-01 09:01:30+00
3 | 1 | add_to_cart | /product/3 | 2025-10-01 09:02:15+00
4 | 1 | view | /products | 2025-10-01 09:45:00+00 ← gap > 30min
5 | 1 | purchase | /checkout | 2025-10-01 09:47:22+00
...
Events 1–3 belong to session 1. A gap of 43 minutes before event 4 starts session 2.
Step 1: Identify session boundaries with LAG
For each event, check if the time since the previous event (by the same customer) exceeds the session timeout:
WITH event_gaps AS (
SELECT
event_id,
customer_id,
event_type,
occurred_at,
LAG(occurred_at) OVER (
PARTITION BY customer_id
ORDER BY occurred_at
) AS prev_event_time,
occurred_at - LAG(occurred_at) OVER (
PARTITION BY customer_id
ORDER BY occurred_at
) AS gap_to_prev
FROM events
)
SELECT
event_id,
customer_id,
event_type,
occurred_at,
gap_to_prev,
CASE
WHEN gap_to_prev IS NULL THEN 1 -- first event ever = new session
WHEN gap_to_prev > INTERVAL '30 minutes' THEN 1 -- gap too long = new session
ELSE 0
END AS is_session_start
FROM event_gaps
ORDER BY customer_id, occurred_at;
Step 2: Assign a session ID using a running sum of boundaries
Each session start gets a 1. A running sum of those 1s gives a monotonically increasing session counter — each new session gets the next number:
WITH event_gaps AS (
SELECT
event_id,
customer_id,
event_type,
page,
occurred_at,
CASE
WHEN LAG(occurred_at) OVER (PARTITION BY customer_id ORDER BY occurred_at) IS NULL
OR occurred_at - LAG(occurred_at) OVER (PARTITION BY customer_id ORDER BY occurred_at)
> INTERVAL '30 minutes'
THEN 1 ELSE 0
END AS is_session_start
FROM events
),
sessions AS (
SELECT
event_id,
customer_id,
event_type,
page,
occurred_at,
SUM(is_session_start) OVER (
PARTITION BY customer_id
ORDER BY occurred_at
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS session_num
FROM event_gaps
)
SELECT * FROM sessions ORDER BY customer_id, occurred_at;
SUM(is_session_start) OVER (...) accumulates the 1s — at each session boundary, the sum increments, giving all subsequent events (until the next boundary) the same session number.