目录
2、first_value() 和 last_value()
前言
窗口函数(Window Functions)是MySQL 8.0引入的最重要特性之一,它允许在不减少行数的情况下对数据进行计算和分析,非常适合报表生成和数据分析场景。
如下图所示:
1、窗口函数的概念
1.1、介绍
窗口函数与聚合函数类似,但有一个关键区别:
聚合函数:将多行合并为一行结果
窗口函数:为每一行返回一个值,同时保留原始行
1.2、基本语法结构
function_name([expression]) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC|DESC], ...]
[frame_clause]
)
2、窗口函数分类
窗口函数可分为排名函数、分析函数、聚合函数。
2.1、排名函数
排名函数分为rank()、dense_rank()、row_number()三种。
以下是三种的区别:
1、rank()
-
示例:1,1,2 (RANK()会得到1,1,3)
-- 计算排名,相同值有相同排名,后续排名会跳过
SELECT
student_name,
score,
RANK() OVER (ORDER BY score DESC) AS rank_score
FROM exam_results;
2、dense_rank()
示例:1,1,2,3 (即使前两个值相同)
-- 计算排名,相同值有相同排名,但后续排名不跳过
SELECT
student_name,
score,
DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank_score
FROM exam_results;
3、row_number()
示例:1,2,3 (即使前两个值相同)
-- 单纯的行号,即使值相同也会分配不同序号
SELECT
student_name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) AS row_num
FROM exam_results;
2.2、分析函数
1、lead() 和 lag()
1.lead()函数:
访问当前行之后的行中的数据
语法:
LEAD(expr, offset, default) OVER (...)
expr
:要获取的列或表达式offset
:向后偏移的行数(默认为1)default
:当没有后续行时返回的值(默认为NULL)
2.lag()函数:
访问当前行之前的行中的数据
语法:
LAG(expr, offset, default) OVER (...)
参数含义与LEAD()相同,只是方向相反
CREATE TABLE sales (
sale_date DATE,
product_id INT,
revenue DECIMAL(10,2)
);
INSERT INTO sales VALUES
('2023-01-01', 1, 1000),
('2023-01-02', 1, 1200),
('2023-01-03', 1, 1500),
('2023-01-04', 1, 900),
('2023-01-05', 1, 1800),
('2023-01-01', 2, 800),
('2023-01-02', 2, 950),
('2023-01-03', 2, 1100),
('2023-01-04', 2, 1300),
('2023-01-05', 2, 700);
-- 查看每日销售额及次日销售额
SELECT
product_id,
sale_date,
revenue,
LEAD(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS next_day_revenue,
LAG(revenue, 1) OVER (PARTITION BY product_id ORDER BY sale_date) AS prev_day_revenue
FROM sales;
2、first_value() 和 last_value()
-- 获取窗口框架内第一个和最后一个值
SELECT
employee_id,
department,
salary,
FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_salary,
LAST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lowest_salary
FROM employees;
3、nth_value()
-- 获取窗口框架内第N个值
SELECT
product_id,
month,
sales,
NTH_VALUE(sales, 2) OVER (PARTITION BY product_id ORDER BY month) AS second_month_sales
FROM product_sales;
2.3、聚合函数作为窗口函数
-- 常用聚合函数也可作为窗口函数使用
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
SUM(salary) OVER (PARTITION BY department) AS dept_total_salary,
COUNT(*) OVER (PARTITION BY department) AS dept_employee_count
FROM employees;
3、窗口定义详解
3.1、partition by 子句
-- 创建员工表
CREATE TABLE employees (
emp_id INT,
emp_name VARCHAR(50),
department VARCHAR(50),
salary DECIMAL(10,2)
);
-- 插入示例数据
INSERT INTO employees VALUES
(1, '张三', '销售部', 8000),
(2, '李四', '销售部', 7500),
(3, '王五', '技术部', 9000),
(4, '赵六', '技术部', 8500),
(5, '钱七', '技术部', 9000),
(6, '孙八', '人事部', 7000);
-- 按部门分区计算
SELECT
employee_name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
3.2、order by子句
-- 按销售日期排序计算累计销售额
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;
3.3、窗口框架(frame_clause)
-- 计算3天移动平均
SELECT
date,
temperature,
AVG(temperature) OVER (ORDER BY date ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS moving_avg
FROM weather_data;
-- 计算累计到当前行的总和(默认框架)
SELECT
month,
revenue,
SUM(revenue) OVER (ORDER BY month ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_revenue
FROM monthly_sales;
4、高级用法示例
4.1、多窗口定义
SELECT
employee_id,
department,
salary,
RANK() OVER w_dept AS dept_rank,
RANK() OVER w_all AS overall_rank
FROM employees
WINDOW
w_dept AS (PARTITION BY department ORDER BY salary DESC),
w_all AS (ORDER BY salary DESC);
4.2、百分比计算
SELECT
student_id,
test_score,
test_score / MAX(test_score) OVER () * 100 AS percent_of_max,
PERCENT_RANK() OVER (ORDER BY test_score) AS percentile
FROM test_scores;
4.3、分组内Top N查询
-- 获取每个部门薪资最高的3名员工
WITH ranked_employees AS (
SELECT
employee_id,
employee_name,
department,
salary,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees
)
SELECT * FROM ranked_employees WHERE rank_in_dept <= 3;
4.4、多列排序
-- 如果分数相同,再按学生ID升序排名
SELECT
student_name,
score,
RANK() OVER (ORDER BY score DESC, student_id ASC) AS refined_rank
FROM student_scores;
5、性能注意事项
窗口函数会在内存中创建临时数据结构,大数据集可能消耗较多内存
合理使用PARTITION BY可以减少处理的数据量
对于复杂查询,考虑使用物化视图或预计算结果
窗口函数极大增强了MySQL的分析能力,使得许多原本需要应用程序处理的复杂计算可以直接在数据库层高效完成。