Match Prediction Features with SQL
Match Prediction Features with SQL
Before training a machine learning model to predict match outcomes, you need a feature table — a structured dataset where each row represents an upcoming fixture and the columns contain predictive signals. This is called feature engineering, and SQL is the right tool for it when working with relational sports data.
This lesson builds a complete pre-match feature pipeline entirely in SQL: rolling form, home/away strength, head-to-head history, and assembling everything into one flat feature table.
What Makes a Good Prediction Feature?
Good ML features for match prediction are:
- Recent: Last 5 matches matter more than the full season for predicting next week
- Opponent-adjusted: Beating relegation candidates is worth less than beating title contenders
- Multi-dimensional: Goals scored, goals conceded, and win rate each capture different aspects of team strength
- Stable: Rate stats (goals per game) are more stable across different fixture schedules than totals
The queries in this lesson produce exactly these kinds of features.
Step 1 — Team Rolling Form (Last 5 Matches)
A team's recent form is often more predictive than their season average, especially for teams that have recently changed manager, formation, or had injuries. We compute a rolling average of the last 5 matches for each team using a window frame.
The ROWS BETWEEN 4 PRECEDING AND CURRENT ROW frame means "include the current match and 4 matches before it, ordered by date." Since different teams play at different rates, we partition by team_id to ensure each team's window is independent.
WITH team_results AS (
SELECT home_team_id AS team_id, match_date,
CASE WHEN home_goals > away_goals THEN 1 ELSE 0 END AS win,
home_goals AS scored, away_goals AS conceded
FROM matches
UNION ALL
SELECT away_team_id, match_date,
CASE WHEN away_goals > home_goals THEN 1 ELSE 0 END AS win,
away_goals AS scored, home_goals AS conceded
FROM matches
),
rolling AS (
SELECT
team_id,
match_date,
AVG(win) OVER (
PARTITION BY team_id
ORDER BY match_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS form_win_rate,
AVG(scored) OVER (
PARTITION BY team_id
ORDER BY match_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS form_scored_avg,
AVG(conceded) OVER (
PARTITION BY team_id
ORDER BY match_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS form_conceded_avg
FROM team_results
)
SELECT * FROM rolling ORDER BY team_id, match_date DESC;
What This Returns
For each team on each match date, three rolling averages:
form_win_rate: Win rate over last 5 games (0.0 to 1.0). 1.0 = won all 5, 0.0 = lost all 5.form_scored_avg: Average goals scored per game over last 5. Represents attacking form.form_conceded_avg: Average goals conceded per game over last 5. Represents defensive form.
These rolling metrics update after every match — so a team that just had a bad run will show it immediately in their form stats.