SQL Window Functions
SQL window functions allow you to perform calculations across a set of rows that
are related to the current row. Unlike aggregate functions (like SUM(),
AVG(), COUNT()) which group rows and return a single value for each group,
window functions return a value for each row, based on a "window" of rows.
This allows for more complex analytical queries without the need for self-joins
or subqueries, often leading to more efficient and readable SQL.
What is a Window?
A "window" is a set of rows defined by the OVER() clause. This clause can include:
PARTITION BY: Divides the rows into groups (partitions) to which the window function is applied independently. Similar to GROUP BY, but rows retain their individual identities.
ORDER BY: Orders the rows within each partition. This is crucial for functions that depend on the order of rows, like ROW_NUMBER(), RANK(), or cumulative sums.
Let's use a simple example with sales data to illustrate.
sales table
| sale_id |
region |
product |
amount |
sale_date |
| 1 |
East |
A |
100.00 |
2025-01-01 |
| 2 |
East |
B |
150.00 |
2025-01-05 |
| 3 |
West |
A |
200.00 |
2025-01-02 |
| 4 |
East |
A |
120.00 |
2025-01-10 |
| 5 |
West |
B |
180.00 |
2025-01-07 |
Ranking Functions: ROW_NUMBER(), RANK(), DENSE_RANK()
These functions assign a rank to each row within its partition based on the ORDER BY clause.
ROW_NUMBER(): a unique sequential integer to each row within its partition.
RANK(): a rank within its partition with gaps for ties.
DENSE_RANK(): a rank within its partition without gaps for ties.
PERCENT_RANK(): a relative rank ranging from 0 to 1 inclusive.
The following query ranks the sales within each region by amount:
SELECT sale_id,
region,
product,
amount,
RANK() OVER (PARTITION BY region ORDER BY amount DESC) as rnk,
FROM sales
Aggregate Window Functions: SUM(), AVG(), COUNT(), MIN(), MAX()
These aggregate functions can be used as window functions as well.
When used with OVER(), they perform the aggregation over the defined window
for each row, rather than collapsing rows into groups.
Let's calculate a running total of sales amount within each region, ordered by sale date:
SELECT sale_id,
region,
product,
amount,
sale_date,
SUM(amount) OVER (PARTITION BY region ORDER BY sale_date) as running_total_region
FROM sales
3. Value Functions: LAG(), LEAD(), FIRST_VALUE(), LAST_VALUE()
These functions allow you to access values from preceding or succeeding rows within the window.
LAG(expression, offset, default): Accesses data from a previous row.
LEAD(expression, offset, default): Accesses data from a subsequent row.
FIRST_VALUE(expression): Returns the value of the expression from the first row in the window frame.
LAST_VALUE(expression): Returns the value of the expression from the last row in the window frame.
NTH_VALUE(expression, n): Returns the value of the expression from the n-th row in the window frame.
Let's find the previous sale amount for each sale within its region:
SELECT sale_id,
region,
product,
amount,
sale_date,
LAG(amount, 1, 0.00) OVER (PARTITION BY region ORDER BY sale_date) as previous_sale_amount
FROM sales
References