Advanced Grouping with SQL's GROUP BY
When looking at data, a common question we have is "For all of X, what is the Y?".
SQL has a great feature to easily query this information with the GROUP BY clause.
In fact there are many ways we can inspect the data with this.
What is a Group?
First let's define what a group in the GROUP BY clause is.
These are relevant only when using aggregate functions. Basically these are
functions that give a scalar value for a set of rows. Without the GROUP BY
you get the value for the entire table. You might have done this before:
SELECT COUNT(*)
FROM my_table
This gets the count across the entire table. When you use GROUP BY like so:
SELECT department, COUNT(*)
FROM my_table
GROUP BY department
The query calculates the COUNT for each department.
Some other common aggregate functions are:
- SUM(): the total sum of a numeric column.
- AVG(): the average value of a numeric column.
- MIN(): the minimum value in a column.
- MAX(): the maximum value in a column.
With that in mind let's look at the different ways we can use GROUP BY.
For the rest of this post, we'll work with a couple of tables for a basketball team.
players
| player_id |
player_name |
position |
| 1 |
Player A |
G |
| 2 |
Player B |
F |
| 3 |
Player C |
C |
game_stats
| game_id |
player_id |
opponent |
points |
assists |
game_date |
| 101 |
1 |
Team X |
30 |
9 |
2024-10-25 |
| 101 |
2 |
Team X |
22 |
12 |
2024-10-25 |
| 102 |
1 |
Team Y |
25 |
7 |
2024-10-27 |
Using Multiple Aggregates: AVG() and SUM()
What if we want more than one summary statistic for a group? We can calculate
several at once. Let's find the average and total points scored for each position.
SELECT p.position,
AVG(gs.points) AS avg_points,
SUM(gs.points) AS total_points
FROM game_stats AS gs
JOIN players AS p
ON gs.player_id = p.player_id
GROUP BY p.position
Here, we group all game stats by the player's position. With this we can
understand if the guards are scoring more points than the centers, for example.
Finding a Range with MIN() and MAX()
We can also use GROUP BY to find the range of values within our groups. Let's find each player's best and worst scoring game of the season.
SELECT p.player_id,
p.player_name,
MIN(gs.points) AS season_low,
MAX(gs.points) AS season_high
FROM game_stats AS gs
JOIN players AS p
ON gs.player_id = p.player_id
GROUP BY p.player_id
By grouping by the player's name, we can now see their individual scoring range over the season.
Filtering Groups with the HAVING Clause
The WHERE clause is for filtering rows before grouping.
But what if we want to filter the groups themselves? For that, we use HAVING.
Let's find players who average over 20 points per game.
SELECT p.player_name,
AVG(gs.points) AS avg_points
FROM game_stats AS gs
JOIN players AS p
ON gs.player_id = p.player_id
GROUP BY p.player_name
HAVING AVG(gs.points) > 20
With HAVING, we can apply a condition after the aggregation, so only the groups that meet our criteria (average points > 20) are returned.
Counting Unique Values with COUNT(DISTINCT)
Sometimes we need to count unique items, not just rows. How many different teams has each player scored against? Using COUNT(*) would count every game, but with COUNT(DISTINCT ...) we can count only the unique opponents.
SELECT p.player_name,
COUNT(DISTINCT gs.opponent) AS unique_opponents
FROM game_stats AS gs
JOIN players AS p
ON gs.player_id = p.player_id
GROUP BY p.player_name
This is a common pattern we'll need for analytics where counting unique entities is more important than counting total events.
Grouping by Multiple Columns
We aren't limited to a single column in our GROUP BY clause. We can group by multiple columns to create more detailed summaries. Let's break down the average points scored first by position, and then by player.
SELECT p.position,
p.player_name,
AVG(gs.points) AS avg_points
FROM game_stats AS gs
JOIN players AS p
ON gs.player_id = p.player_id
GROUP BY p.position,
p.player_name
ORDER BY p.position,
avg_points DESC
This creates subgroups, which lets us see how individual players stack up within their positional group.
References