Introduction to Sports Analytics with SQL

Introduction to Sports Analytics with SQL

Sports analytics is one of the most compelling applications of SQL — combining time-series data, ranking problems, head-to-head comparisons, and performance modeling into a field that has transformed how professional sports teams operate. From the Moneyball revolution in baseball to the modern football analytics movement driven by clubs like Liverpool and Brentford, data-driven decision making is now central to every major sport.

In this section, you'll work with a realistic sports dataset modelled on professional football (soccer) leagues. Every query you write here has a direct analogue to the work done by data teams at top clubs, sports media companies like Opta and StatsBomb, and fantasy sports platforms.

What You Will Learn

By the end of this section, you will be able to:

  • Build complete league standings tables from raw match data — the same calculation powering every football league website
  • Compute per-90-minute player stats and percentile rankings, the standard in modern football analytics
  • Analyse head-to-head records and identify psychological advantages between rival teams
  • Detect winning and losing streaks using the gaps-and-islands technique
  • Build fantasy sports rating systems with price-to-value analysis
  • Engineer machine learning features from match data entirely in SQL

The Dataset

The sports analytics section uses four tables. Understanding the schema before writing any query saves significant debugging time.

-- TEAMS: Each team in the league
-- id (UUID), name (TEXT), league (TEXT)
SELECT * FROM teams LIMIT 5;

The teams table is small — typically 10-20 rows for a single league. Every other table joins back to it via foreign keys.

-- PLAYERS: Registered players for each team
-- id (UUID), name (TEXT), team_id (UUID → teams.id),
-- position (TEXT: Forward/Midfielder/Defender/Goalkeeper),
-- nationality (TEXT), age (INTEGER)
SELECT * FROM players LIMIT 5;

Player position is important — it determines how you compare players (strikers vs midfielders have different expected stats). Always filter or partition by position when ranking players.

-- MATCHES: One row per match
-- id (UUID), home_team_id (UUID), away_team_id (UUID),
-- match_date (DATE), home_goals (INTEGER), away_goals (INTEGER),
-- season (TEXT, e.g. '2023-24'), competition (TEXT: 'league'/'cup'/'friendly')
SELECT * FROM matches LIMIT 5;

Most queries in this section filter WHERE competition = 'league' to exclude cup and friendly results, which don't count toward league standings.

-- PLAYER_STATS: Per-match player performance
-- player_id (UUID → players.id), match_id (UUID → matches.id),
-- goals (INTEGER), assists (INTEGER), minutes_played (INTEGER),
-- yellow_cards (INTEGER), red_cards (INTEGER),
-- shots (INTEGER), passes (INTEGER)
SELECT * FROM player_stats LIMIT 5;

player_stats is the largest table — one row per player per match. Most player analysis queries aggregate this table, joining to matches for date/season filtering and to players for name/position context.

Purchase this course to unlock the full lesson.

Sign up