Oracle 开窗函数(Window Functions)是一类强大的函数,它可以对查询结果集进行分组、排序,并在指定的窗口范围内执行计算,而不会像传统聚合函数那样将结果集合并为单行。这种特性使得开窗函数在数据分析、报表生成和复杂查询中非常实用。
开窗函数的基本语法
function_name(expression) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC|DESC], ...]
[window_clause]
)
- PARTITION BY:将结果集按指定列分组,开窗函数在每个分组内独立计算。
- ORDER BY:定义分组内的排序规则,影响函数的计算顺序。
- window_clause:可选参数,进一步定义窗口的大小和范围(如当前行之前 / 之后的行)。
常见开窗函数分类
1. 排名函数
用于生成排名值,常见的有:
ROW_NUMBER()
:为每行分配唯一的连续整数排名。RANK()
:允许并列排名,下一个排名会跳过重复值。DENSE_RANK()
:允许并列排名,但排名始终连续。NTILE(n)
:将结果集分为n
个桶,返回每行所属的桶号。
示例表:EMPLOYEES
(EMP_ID
, DEPT_ID
, SALARY
)
-- 按部门分组,对员工按工资降序排名
SELECT
EMP_ID,
DEPT_ID,
SALARY,
ROW_NUMBER() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS RN,
RANK() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS RK,
DENSE_RANK() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS DRK,
NTILE(2) OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS BUCKET
FROM EMPLOYEES;
结果示例:
EMP_ID | DEPT_ID | SALARY | RN | RK | DRK | BUCKET |
---|---|---|---|---|---|---|
101 | 10 | 8000 | 1 | 1 | 1 | 1 |
102 | 10 | 7500 | 2 | 2 | 2 | 1 |
103 | 10 | 7500 | 3 | 2 | 2 | 2 |
104 | 10 | 6000 | 4 | 4 | 3 | 2 |
2. 聚合函数(作为开窗函数)
常见的聚合函数(如SUM
, AVG
, MIN
, MAX
, COUNT
)可以在窗口内计算。
-- 计算每个部门的累计工资和移动平均工资
SELECT
EMP_ID,
DEPT_ID,
SALARY,
SUM(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS CUM_SUM,
AVG(SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS MOVING_AVG
FROM EMPLOYEES;
结果示例:
EMP_ID | DEPT_ID | SALARY | CUM_SUM | MOVING_AVG |
---|---|---|---|---|
101 | 10 | 8000 | 8000 | 8000 |
102 | 10 | 7500 | 15500 | 7750 |
103 | 10 | 7500 | 23000 | 7500 |
3. 分析函数
用于计算行与行之间的关系,常见的有:
LAG/LEAD
:访问当前行之前 / 之后的行数据。FIRST_VALUE/LAST_VALUE
:获取窗口内的第一个 / 最后一个值。CUME_DIST
:计算累积分布(小于等于当前值的比例)。PERCENT_RANK
:计算百分比排名(0 到 1 之间的值)。
-- 使用LAG和LEAD比较当前工资与前/后员工的工资差异
SELECT
EMP_ID,
DEPT_ID,
SALARY,
LAG(SALARY, 1, 0) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS PREV_SALARY,
LEAD(SALARY, 1, 0) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS NEXT_SALARY,
SALARY - LAG(SALARY, 1, SALARY) OVER (PARTITION BY DEPT_ID ORDER BY EMP_ID) AS DIFF
FROM EMPLOYEES;
结果示例:
EMP_ID | DEPT_ID | SALARY | PREV_SALARY | NEXT_SALARY | DIFF |
---|---|---|---|---|---|
101 | 10 | 8000 | 0 | 7500 | 0 |
102 | 10 | 7500 | 8000 | 7500 | -500 |
103 | 10 | 7500 | 7500 | 6000 | 0 |
窗口子句(window_clause)详解
窗口子句用于精确控制窗口的范围,语法如下:
[ROWS | RANGE] BETWEEN start_expression AND end_expression
- ROWS:基于物理行号定义窗口。
- RANGE:基于逻辑值定义窗口(适用于数值或日期类型)。
- 边界选项:
UNBOUNDED PRECEDING
:窗口起始于分区的第一行。UNBOUNDED FOLLOWING
:窗口结束于分区的最后一行。CURRENT ROW
:当前行。n PRECEDING
:当前行之前的n
行。n FOLLOWING
:当前行之后的n
行。
示例:计算当前行及前后各一行的平均工资。
SELECT
EMP_ID,
SALARY,
AVG(SALARY) OVER (ORDER BY EMP_ID ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS MOVING_AVG
FROM EMPLOYEES;
高级应用场景
1. 分组取 Top N
获取每个部门工资最高的前两名员工。
SELECT *
FROM (
SELECT
EMP_ID,
DEPT_ID,
SALARY,
ROW_NUMBER() OVER (PARTITION BY DEPT_ID ORDER BY SALARY DESC) AS RN
FROM EMPLOYEES
)
WHERE RN <= 2;
2. 计算同比 / 环比
对比当前月份与上月的销售额。
SELECT
MONTH,
SALES,
LAG(SALES, 1) OVER (ORDER BY MONTH) AS PREV_MONTH_SALES,
((SALES - LAG(SALES, 1) OVER (ORDER BY MONTH)) / LAG(SALES, 1) OVER (ORDER BY MONTH)) * 100 AS GROWTH_RATE
FROM SALES_DATA;
3. 累积分布分析
计算每个员工的工资在部门内的累积分布。
SELECT
EMP_ID,
DEPT_ID,
SALARY,
CUME_DIST() OVER (PARTITION BY DEPT_ID ORDER BY SALARY) AS CUM_DIST
FROM EMPLOYEES;
开窗函数与聚合函数的区别
开窗函数 | 传统聚合函数 |
---|---|
不减少结果集行数 | 会将结果集合并为单行 |
通过OVER 子句定义窗口 |
使用GROUP BY 分组 |
可为每行返回聚合值 | 每组只返回一个聚合值 |
注意事项
- 性能考虑:开窗函数在大数据集上可能影响性能,需合理使用索引。
- 窗口子句限制:
NTILE
,RANK
,DENSE_RANK
,ROW_NUMBER
等函数不支持窗口子句。 - ORDER BY 必要性:部分函数(如
LAG
,LEAD
)必须使用ORDER BY
。