Sessionization — Grouping Events into Sessions

Sessionization is the process of grouping a stream of timestamped user events into discrete "sessions" — periods of activity separated by gaps of inactivity. It's fundamental to web analytics, product analytics, and any system where user behaviour is tracked as events.

The standard definition: a session ends when a user is inactive for more than N minutes (commonly 30). Every event that falls within N minutes of the previous event belongs to the same session.


The data

The ShopMetrics events table tracks customer behaviour:

SELECT * FROM events ORDER BY customer_id, occurred_at LIMIT 10;
 event_id | customer_id | event_type   | page       | occurred_at
----------+-------------+--------------+------------+----------------------------
        1 |           1 | view         | /products  | 2025-10-01 09:00:00+00
        2 |           1 | view         | /product/3 | 2025-10-01 09:01:30+00
        3 |           1 | add_to_cart  | /product/3 | 2025-10-01 09:02:15+00
        4 |           1 | view         | /products  | 2025-10-01 09:45:00+00  ← gap > 30min
        5 |           1 | purchase     | /checkout  | 2025-10-01 09:47:22+00
       ...

Events 1–3 belong to session 1. A gap of 43 minutes before event 4 starts session 2.


Step 1: Identify session boundaries with LAG

For each event, check if the time since the previous event (by the same customer) exceeds the session timeout:

WITH event_gaps AS (
    SELECT
        event_id,
        customer_id,
        event_type,
        occurred_at,
        LAG(occurred_at) OVER (
            PARTITION BY customer_id
            ORDER BY occurred_at
        )                                   AS prev_event_time,
        occurred_at - LAG(occurred_at) OVER (
            PARTITION BY customer_id
            ORDER BY occurred_at
        )                                   AS gap_to_prev
    FROM events
)
SELECT
    event_id,
    customer_id,
    event_type,
    occurred_at,
    gap_to_prev,
    CASE
        WHEN gap_to_prev IS NULL               THEN 1  -- first event ever = new session
        WHEN gap_to_prev > INTERVAL '30 minutes' THEN 1  -- gap too long = new session
        ELSE 0
    END AS is_session_start
FROM event_gaps
ORDER BY customer_id, occurred_at;

Step 2: Assign a session ID using a running sum of boundaries

Each session start gets a 1. A running sum of those 1s gives a monotonically increasing session counter — each new session gets the next number:

WITH event_gaps AS (
    SELECT
        event_id,
        customer_id,
        event_type,
        page,
        occurred_at,
        CASE
            WHEN LAG(occurred_at) OVER (PARTITION BY customer_id ORDER BY occurred_at) IS NULL
              OR occurred_at - LAG(occurred_at) OVER (PARTITION BY customer_id ORDER BY occurred_at)
                 > INTERVAL '30 minutes'
            THEN 1 ELSE 0
        END AS is_session_start
    FROM events
),
sessions AS (
    SELECT
        event_id,
        customer_id,
        event_type,
        page,
        occurred_at,
        SUM(is_session_start) OVER (
            PARTITION BY customer_id
            ORDER BY occurred_at
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) AS session_num
    FROM event_gaps
)
SELECT * FROM sessions ORDER BY customer_id, occurred_at;

SUM(is_session_start) OVER (...) accumulates the 1s — at each session boundary, the sum increments, giving all subsequent events (until the next boundary) the same session number.

Purchase this course to unlock the full lesson.

Sign up