赞
踩
窗口函数也叫作OLAP(Online Analytical Processing,联机分析处理)函数,可以对数据库中的数据进行复杂分析。
窗口函数的通用语法如下:
<窗口函数>over(partition by<用于分组的列名> order by <用于排序的列名>) |
我们看一下这个语法里每部分表示什么。
(1)<窗口函数>的位置可以放两种函数:一种是专用窗口函数,比如用于排名的函数,比如rank()、dense_rank()、row_number();另一种是汇总函数,比如sum()、avg()、count()、max()、min()。
(2)<窗口函数>后面的over关键字括号里的内容有两部分:一个是partition by,表示按某列分组;另一个是order by,表示对分组后的结果按某列排序。
(3)因为窗口函数通常是对where或者group by子句处理后的结果进行操作的,所以窗口函数原则上只能写在select子句中。
窗口函数可以解决这几类经典问题:排名问题、Top N问题、前百分之N问题、累计问题、每组内比较问题、连续问题。这些问题在工作中你会经常遇到,比如,排名问题,对用户搜索关键字按搜索次数排名、对商品按销售量排名。
测试数据:
- -- 创建表格
- CREATE TABLE t_score (
- student_name VARCHAR2(50),
- course_name VARCHAR2(50),
- score NUMBER
- );
-
- -- 插入数据
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '数学', 85);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '英语', 78);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '物理', 92);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小红', '数学', 90);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小红', '英语', 80);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '数学', 90);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '数学', 60);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '英语', 85);
- INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '物理', 85);
- commit;
- -- 取得每名学生不同课程的成绩排名.
sql语句实现:
测试数据:
- -- 创建表格
- CREATE TABLE t_salary_table (
- employee_id NUMBER,
- department_id NUMBER,
- salary NUMBER
- );
-
- -- 插入数据
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (1, 1, 50000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (2, 1, 52000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (3, 1, 48000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (4, 1, 51000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (5, 1, 49000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (6, 2, 60000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (7, 2, 58000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (8, 2, 62000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (9, 2, 59000);
- INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (10, 2, 61000);
-
-
- -- 查询数据
- SELECT * FROM t_salary_table;
-
- -- 查询每个部门去除最高、最低薪水后的平均薪水
-
sql实现:
工作中会经常遇到这样的业务问题:
如何找到每个类别下用户最喜欢的商品?
如何找到每个类别下用户点击最多的5个商品?
这类问题其实就是非常经典的Top N问题,也就是在对数据分组后,取每组里的最大值、最小值,或者每组里最大的N行(Top N)数据
数据:
- -- 删除表格
- DROP TABLE t_score;
-
- -- 创建表格
- CREATE TABLE t_score (
- course_id NUMBER,
- student_id NUMBER,
- score NUMBER
- );
-
- -- 插入数据
- INSERT INTO t_score VALUES (1, 1, 85);
- INSERT INTO t_score VALUES (1, 2, 78);
- INSERT INTO t_score VALUES (1, 3, 92);
- INSERT INTO t_score VALUES (1, 4, 90);
- INSERT INTO t_score VALUES (1, 5, 80);
- INSERT INTO t_score VALUES (1, 6, 92);
- INSERT INTO t_score VALUES (1, 7, 78);
- INSERT INTO t_score VALUES (1, 8, 92);
- INSERT INTO t_score VALUES (1, 9, 85);
- INSERT INTO t_score VALUES (2, 1, 88);
- INSERT INTO t_score VALUES (2, 2, 82);
- INSERT INTO t_score VALUES (2, 3, 90);
- INSERT INTO t_score VALUES (2, 4, 85);
- INSERT INTO t_score VALUES (2, 5, 78);
- INSERT INTO t_score VALUES (2, 6, 88);
- INSERT INTO t_score VALUES (2, 7, 82);
- INSERT INTO t_score VALUES (2, 8, 90);
- INSERT INTO t_score VALUES (2, 9, 82);
-
- COMMIT;
-
- -- 查询数据
- SELECT * FROM t_score;
-
- # todo 题目要求: “成绩表”中记录了学生选修的课程号、学生的学号,以及对应课程的成绩。
- 为了对学生成绩进行考核,现需要查询每门课程前三名学生的成绩。
- # todo 注意:如果出现同样的成绩,则视为同一个名次。
数据:
- -- 删除表格
- DROP TABLE t_employee;
-
- -- 创建表格
- CREATE TABLE t_employee (
- emp_id NUMBER,
- emp_name VARCHAR2(50),
- salary NUMBER,
- department_id NUMBER
- );
-
- -- 插入数据
- INSERT INTO t_employee (emp_id, emp_name, salary, department_id)
- SELECT 1, '小明', 50000, 1 FROM dual UNION ALL
- SELECT 2, '小红', 52000, 1 FROM dual UNION ALL
- SELECT 3, '小李', 48000, 1 FROM dual UNION ALL
- SELECT 4, '小张', 60000, 1 FROM dual UNION ALL
- SELECT 5, '小王', 58000, 1 FROM dual UNION ALL
- SELECT 6, '小刚', 62000, 1 FROM dual UNION ALL
- SELECT 7, '小丽', 45000, 2 FROM dual UNION ALL
- SELECT 8, '小芳', 47000, 2 FROM dual UNION ALL
- SELECT 9, '小晓', 49000, 2 FROM dual UNION ALL
- SELECT 10, '小华', 52000, 2 FROM dual UNION ALL
- SELECT 11, '小雷', 52000, 2 FROM dual;
-
- COMMIT;
-
- -- 查询数据
- SELECT * FROM t_employee;
- -- 现在要查找每个部门工资排在前两名的雇员信息,若雇员工资一样,则并列获取。
sql实现:
- with t1 as (
- select t_employee.*, dense_rank() over (partition by department_id order by salary desc) dr
- from t_employee
- )
- select * from t1 where dr<=2
运行:
汇总函数sum()用在窗口函数中,表示对数据进行累计求和
“rows between <范围起始行> and <范围终止行>”用于指定移动窗口的范围,范围包含起始行和终止行。
range between 连续月份
其中,“范围起始行”和“范围终止行”使用特定关键字表示,常用的特定关键字如下。
• n preceding:当前行的前n行。
• n following:当前行的后n行。
• current row:当前行。
• unbounded preceding:第1行。
• unbounded following:最后1行。
测试数据:
- -- 579. 查询员工的累计薪水
- drop table t_employee;
- create table t_employee (id int, month int, salary int);
- truncate table t_employee;
-
- insert into t_employee values ('1', '1', '20');
- insert into t_employee values ('2', '1', '20');
- insert into t_employee values ('1', '2', '30');
- insert into t_employee values ('2', '2', '30');
- insert into t_employee values ('3', '2', '40');
- insert into t_employee values ('1', '3', '40');
- insert into t_employee values ('3', '3', '60');
- insert into t_employee values ('1', '4', '60');
- insert into t_employee values ('3', '4', '70');
- insert into t_employee values ('1', '7', '90');
- insert into t_employee values ('1', '8', '90');
-
- commit;
- select * from t_employee order by id, month desc;
- -- 编写一个解决方案,在一个统一的表中计算出每个员工的 累计工资汇总 。
- -- 员工的 累计工资汇总 可以计算如下:
- -- 对于该员工工作的每个月,将 该月 和 前两个月 的工资 加 起来。
- --这是他们当月的 3 个月总工资和
- -- 返回按 id 升序排序 的结果表。如果 id 相等,请按 month 降序排序。
- select
- t_employee.*,
- sum(salary)
- over (partition by id order by month desc range between current row and 2 following ) sum_s
- from t_employee
- order by id,month desc;
测试数据:
-
- -- 创建表格
- CREATE TABLE t_score (
- student_name VARCHAR2(20),
- course_name VARCHAR2(20),
- score NUMBER(3)
- );
-
- -- 插入数据
- INSERT INTO t_score VALUES ('张三', '语文', 90);
- INSERT INTO t_score VALUES ('李四', '语文', 81);
- INSERT INTO t_score VALUES ('王朝', '语文', 79);
- INSERT INTO t_score VALUES ('马汉', '语文', 88);
- INSERT INTO t_score VALUES ('张三', '数学', 85);
- INSERT INTO t_score VALUES ('李四', '数学', 86);
- INSERT INTO t_score VALUES ('王朝', '数学', 92);
- INSERT INTO t_score VALUES ('马汉', '数学', 83);
- INSERT INTO t_score VALUES ('张三', '英语', 87);
- INSERT INTO t_score VALUES ('李四', '英语', 98);
- INSERT INTO t_score VALUES ('王朝', '英语', 93);
- INSERT INTO t_score VALUES ('马汉', '英语', 95);
-
- COMMIT;
-
- -- 查询数据
- SELECT * FROM t_score;
-
- # todo “成绩表”,记录了每个学生各科的成绩。现在要查找单科成绩高于该科目平均成绩的学生名单。
-
实现:
- -- t1求每个科目的平均值
- with t1 as ( select t_score.*, avg(score) over ( partition by course_name) avg
- from t_score)
- -- 过滤单科成绩高于该科目平均成绩的学生名单
- select course_name,student_name from t1 where score>avg;
- -- 创建员工表
- CREATE TABLE t_employee (
- employeeID INT,
- departmentID INT,
- salary INT
- );
-
- -- 插入数据
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10001, 1, 60117);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10002, 2, 92102);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10003, 2, 86074);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10004, 1, 66596);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10005, 1, 66961);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10006, 2, 81046);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10007, 2, 94333);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10008, 1, 75286);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10009, 2, 85994);
- INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10010, 1, 76884);
- -- 现在公司要找出每个部门低于平均薪水的雇员,然后进行培训来提高雇员工作效率,从而提高雇员薪水。
sql实现:
- -- 现在公司要找出每个部门低于平均薪水的雇员,然后进行培训来提高雇员工作效率,从而提高雇员薪水。
- with t1 as (
- select t_employee.* ,avg(salary) over ( partition by departmentID ) avg from t_employee
- )
- select * from t1 where salary < avg order by departmentID;
- -- 创建分数表,并为列名增加注释
- drop table t_score;
-
- CREATE TABLE t_score (
- team_name VARCHAR2(50),
- player_id INT,
- player_name VARCHAR2(50),
- score INT,
- score_time TIMESTAMP
- );
-
- COMMENT ON COLUMN t_score.team_name IS '球队名称';
- COMMENT ON COLUMN t_score.player_id IS '球员ID';
- COMMENT ON COLUMN t_score.player_name IS '球员姓名';
- COMMENT ON COLUMN t_score.score IS '得分';
- COMMENT ON COLUMN t_score.score_time IS '得分时间';
-
-
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:00:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:15:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 1, TO_TIMESTAMP('2023-12-25 10:30:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 3, '安东尼·戴维斯', 2, TO_TIMESTAMP('2023-12-25 10:32:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:45:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 11:00:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 2, TO_TIMESTAMP('2023-12-25 11:15:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 2, TO_TIMESTAMP('2023-12-25 11:30:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:10:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:25:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:40:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 11, '克莱·汤普森', 2, TO_TIMESTAMP('2023-12-25 10:45:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 2, TO_TIMESTAMP('2023-12-25 10:55:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 2, TO_TIMESTAMP('2023-12-25 11:10:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:25:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:40:00', 'YYYY-MM-DD HH24:MI:SS'));
- INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:55:00', 'YYYY-MM-DD HH24:MI:SS'));
-
- -- 查询数据
- SELECT * FROM t_score;
- -- 请你写一个SQL语句,统计出连续3次为球队得分的球员名单。
sql实现:
第一种方法: lead() 适用小范围连续
- with t1 as ( SELECT t_score.*,
- -- 向下拿一行
- lead(player_id,1) over ( partition by team_name order by score_time) rn1,
- -- 向下拿两行
- lead(player_id,1) over ( partition by team_name order by score_time) rn2
- FROM t_score)
- -- 过滤连续三次的id
- select distinct player_name,player_id,team_name from t1
- where player_id = rn1 and rn1 = rn2;
第二种方法:等差数列 通用
分析:
- with t1 as(
- select
- Cinema.*,
- row_number() over (partition by free order by seat_id) as rn1,
- seat_id - (row_number() over (partition by free order by seat_id)) as 差值
- from Cinema),
- t2 as (
- select seat_id,count(差值) over (partition by 差值) as 计数 from t1 where free=1)
- select seat_id from t2 where 计数>1 order by seat_id;
方法1:
- Create table Cinema
- (
- seat_id number primary key,
- free varchar(2)
- );
-
- insert into Cinema (seat_id, free)
- values ('1', '1');
- insert into Cinema (seat_id, free)
- values ('2', '0');
- insert into Cinema (seat_id, free)
- values ('3', '1');
- insert into Cinema (seat_id, free)
- values ('4', '1');
- insert into Cinema (seat_id, free)
- values ('5', '1');
- -- 查找连续2个及以上座位可用的座位号
- select *
- from cinema;
- with t1 as ( select Cinema.*,
- lead(free,1) over ( order by seat_id) rn1,
- lag(free,1) over ( order by seat_id) rn2
- from Cinema)
- select seat_id from t1
- where free = 1 and rn1 = 1
- -- 当前为1,rn1=null,上一行也为1 --当前座位也是空的
- or(free=1 and rn1 is null and rn2=1)
方法2:
- with t1 as(
- select
- Cinema.*,
- row_number() over (partition by free order by seat_id) as rn1,
- seat_id - (row_number() over (partition by free order by seat_id)) as 差值
- from Cinema),
- t2 as (
- select seat_id,count(差值) over (partition by 差值) as 计数 from t1 where free=1)
- select seat_id from t2 where 计数>1 order by seat_id;
-
- Create table Logs (id int, num int);
- Truncate table Logs;
- insert into Logs (id, num) values ('1', '1');
- insert into Logs (id, num) values ('2', '1');
- insert into Logs (id, num) values ('3', '1');
- insert into Logs (id, num) values ('4', '2');
- insert into Logs (id, num) values ('5', '1');
- insert into Logs (id, num) values ('6', '2');
- insert into Logs (id, num) values ('7', '2');
-
- select * from logs;
-
- # todo 需求: 找出所有至少连续出现三次的数字。
- # todo 返回的结果表中的数据可以按 任意顺序 排列。
-
sql实现:
- -- 方法1:
- with t1 as (
- select
- Logs.*,
- id - (row_number() over (partition by num order by id)) 差值
- from Logs),
- t2 as (
- select num,count(1) over (partition by 差值,num) as 计数 from t1)
- select distinct num from t2 where 计数>=3;
-
-
- 方法2 lag lead
- with t1 as (
- select
- Logs.*,
- lag(num) over (order by id) lag1,
- lead(num) over (order by id) lead1
- from Logs)
- select distinct num from t1 where t1.num=lag1 and t1.num=lead1;
运行结果:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。