当前位置:   article > 正文

oracle 窗口函数需求总结_oracle窗口函数

oracle窗口函数

什么是窗口函数

        窗口函数也叫作OLAP(Online Analytical Processing,联机分析处理)函数,可以对数据库中的数据进行复杂分析。

窗口函数的通用语法如下:

<窗口函数>over(partition by<用于分组的列名>

order by <用于排序的列名>)

我们看一下这个语法里每部分表示什么。

(1)<窗口函数>的位置可以放两种函数:一种是专用窗口函数,比如用于排名的函数,比如rank()、dense_rank()、row_number();另一种是汇总函数,比如sum()、avg()、count()、max()、min()。

(2)<窗口函数>后面的over关键字括号里的内容有两部分:一个是partition by,表示按某列分组;另一个是order by,表示对分组后的结果按某列排序。

(3)因为窗口函数通常是对where或者group by子句处理后的结果进行操作的,所以窗口函数原则上只能写在select子句中。

窗口函数可以解决这几类经典问题:排名问题、Top N问题、前百分之N问题、累计问题、每组内比较问题、连续问题。这些问题在工作中你会经常遇到,比如,排名问题,对用户搜索关键字按搜索次数排名、对商品按销售量排名。

第一类:排名问题

1.学生成绩排名.

测试数据:

  1. -- 创建表格
  2. CREATE TABLE t_score (
  3. student_name VARCHAR2(50),
  4. course_name VARCHAR2(50),
  5. score NUMBER
  6. );
  7. -- 插入数据
  8. INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '数学', 85);
  9. INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '英语', 78);
  10. INSERT INTO t_score (student_name, course_name, score) VALUES ('小明', '物理', 92);
  11. INSERT INTO t_score (student_name, course_name, score) VALUES ('小红', '数学', 90);
  12. INSERT INTO t_score (student_name, course_name, score) VALUES ('小红', '英语', 80);
  13. INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '数学', 90);
  14. INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '数学', 60);
  15. INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '英语', 85);
  16. INSERT INTO t_score (student_name, course_name, score) VALUES ('小李', '物理', 85);
  17. commit;
  18. -- 取得每名学生不同课程的成绩排名.

sql语句实现:

2.去除最大值、最小值后求平均值

测试数据:

  1. -- 创建表格
  2. CREATE TABLE t_salary_table (
  3. employee_id NUMBER,
  4. department_id NUMBER,
  5. salary NUMBER
  6. );
  7. -- 插入数据
  8. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (1, 1, 50000);
  9. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (2, 1, 52000);
  10. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (3, 1, 48000);
  11. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (4, 1, 51000);
  12. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (5, 1, 49000);
  13. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (6, 2, 60000);
  14. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (7, 2, 58000);
  15. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (8, 2, 62000);
  16. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (9, 2, 59000);
  17. INSERT INTO t_salary_table (employee_id, department_id, salary) VALUES (10, 2, 61000);
  18. -- 查询数据
  19. SELECT * FROM t_salary_table;
  20. -- 查询每个部门去除最高、最低薪水后的平均薪水

sql实现:

第二类:TOP N问题

工作中会经常遇到这样的业务问题:

         如何找到每个类别下用户最喜欢的商品?

         如何找到每个类别下用户点击最多的5个商品?

 这类问题其实就是非常经典的Top N问题,也就是在对数据分组后,取每组里的最大值、最小值,或者每组里最大的N行(Top N)数据

1.查询前三名的成绩

数据:

  1. -- 删除表格
  2. DROP TABLE t_score;
  3. -- 创建表格
  4. CREATE TABLE t_score (
  5. course_id NUMBER,
  6. student_id NUMBER,
  7. score NUMBER
  8. );
  9. -- 插入数据
  10. INSERT INTO t_score VALUES (1, 1, 85);
  11. INSERT INTO t_score VALUES (1, 2, 78);
  12. INSERT INTO t_score VALUES (1, 3, 92);
  13. INSERT INTO t_score VALUES (1, 4, 90);
  14. INSERT INTO t_score VALUES (1, 5, 80);
  15. INSERT INTO t_score VALUES (1, 6, 92);
  16. INSERT INTO t_score VALUES (1, 7, 78);
  17. INSERT INTO t_score VALUES (1, 8, 92);
  18. INSERT INTO t_score VALUES (1, 9, 85);
  19. INSERT INTO t_score VALUES (2, 1, 88);
  20. INSERT INTO t_score VALUES (2, 2, 82);
  21. INSERT INTO t_score VALUES (2, 3, 90);
  22. INSERT INTO t_score VALUES (2, 4, 85);
  23. INSERT INTO t_score VALUES (2, 5, 78);
  24. INSERT INTO t_score VALUES (2, 6, 88);
  25. INSERT INTO t_score VALUES (2, 7, 82);
  26. INSERT INTO t_score VALUES (2, 8, 90);
  27. INSERT INTO t_score VALUES (2, 9, 82);
  28. COMMIT;
  29. -- 查询数据
  30. SELECT * FROM t_score;
  31. # todo 题目要求: “成绩表”中记录了学生选修的课程号、学生的学号,以及对应课程的成绩。
  32. 为了对学生成绩进行考核,现需要查询每门课程前三名学生的成绩。
  33. # todo 注意:如果出现同样的成绩,则视为同一个名次。

2: 查询排在前两名的工资

数据:

  1. -- 删除表格
  2. DROP TABLE t_employee;
  3. -- 创建表格
  4. CREATE TABLE t_employee (
  5. emp_id NUMBER,
  6. emp_name VARCHAR2(50),
  7. salary NUMBER,
  8. department_id NUMBER
  9. );
  10. -- 插入数据
  11. INSERT INTO t_employee (emp_id, emp_name, salary, department_id)
  12. SELECT 1, '小明', 50000, 1 FROM dual UNION ALL
  13. SELECT 2, '小红', 52000, 1 FROM dual UNION ALL
  14. SELECT 3, '小李', 48000, 1 FROM dual UNION ALL
  15. SELECT 4, '小张', 60000, 1 FROM dual UNION ALL
  16. SELECT 5, '小王', 58000, 1 FROM dual UNION ALL
  17. SELECT 6, '小刚', 62000, 1 FROM dual UNION ALL
  18. SELECT 7, '小丽', 45000, 2 FROM dual UNION ALL
  19. SELECT 8, '小芳', 47000, 2 FROM dual UNION ALL
  20. SELECT 9, '小晓', 49000, 2 FROM dual UNION ALL
  21. SELECT 10, '小华', 52000, 2 FROM dual UNION ALL
  22. SELECT 11, '小雷', 52000, 2 FROM dual;
  23. COMMIT;
  24. -- 查询数据
  25. SELECT * FROM t_employee;
  26. -- 现在要查找每个部门工资排在前两名的雇员信息,若雇员工资一样,则并列获取。

sql实现:

  1. with t1 as (
  2. select t_employee.*, dense_rank() over (partition by department_id order by salary desc) dr
  3. from t_employee
  4. )
  5. select * from t1 where dr<=2

运行:

第三类:累计问题

查询员工的累计薪水

        汇总函数sum()用在窗口函数中,表示对数据进行累计求和

        “rows between <范围起始行> and <范围终止行>”用于指定移动窗口的范围,范围包含起始行和终止行。

        range between  连续月份

        其中,“范围起始行”和“范围终止行”使用特定关键字表示,常用的特定关键字如下。

       • n preceding:当前行的前n行。

       • n following:当前行的后n行。

       • current row:当前行。

       • unbounded preceding:第1行。

       • unbounded following:最后1行。

测试数据:

  1. -- 579. 查询员工的累计薪水
  2. drop table t_employee;
  3. create table t_employee (id int, month int, salary int);
  4. truncate table t_employee;
  5. insert into t_employee values ('1', '1', '20');
  6. insert into t_employee values ('2', '1', '20');
  7. insert into t_employee values ('1', '2', '30');
  8. insert into t_employee values ('2', '2', '30');
  9. insert into t_employee values ('3', '2', '40');
  10. insert into t_employee values ('1', '3', '40');
  11. insert into t_employee values ('3', '3', '60');
  12. insert into t_employee values ('1', '4', '60');
  13. insert into t_employee values ('3', '4', '70');
  14. insert into t_employee values ('1', '7', '90');
  15. insert into t_employee values ('1', '8', '90');
  16. commit;
  17. select * from t_employee order by id, month desc;
  18. -- 编写一个解决方案,在一个统一的表中计算出每个员工的 累计工资汇总 。
  19. -- 员工的 累计工资汇总 可以计算如下:
  20. -- 对于该员工工作的每个月,将 该月 和 前两个月 的工资 加 起来。
  21. --这是他们当月的 3 个月总工资和
  22. -- 返回按 id 升序排序 的结果表。如果 id 相等,请按 month 降序排序。
  1. select
  2. t_employee.*,
  3. sum(salary)
  4. over (partition by id order by month desc range between current row and 2 following ) sum_s
  5. from t_employee
  6. order by id,month desc;

第四类:每组内比较问题

1.每组大于平均值

测试数据:

  1. -- 创建表格
  2. CREATE TABLE t_score (
  3. student_name VARCHAR2(20),
  4. course_name VARCHAR2(20),
  5. score NUMBER(3)
  6. );
  7. -- 插入数据
  8. INSERT INTO t_score VALUES ('张三', '语文', 90);
  9. INSERT INTO t_score VALUES ('李四', '语文', 81);
  10. INSERT INTO t_score VALUES ('王朝', '语文', 79);
  11. INSERT INTO t_score VALUES ('马汉', '语文', 88);
  12. INSERT INTO t_score VALUES ('张三', '数学', 85);
  13. INSERT INTO t_score VALUES ('李四', '数学', 86);
  14. INSERT INTO t_score VALUES ('王朝', '数学', 92);
  15. INSERT INTO t_score VALUES ('马汉', '数学', 83);
  16. INSERT INTO t_score VALUES ('张三', '英语', 87);
  17. INSERT INTO t_score VALUES ('李四', '英语', 98);
  18. INSERT INTO t_score VALUES ('王朝', '英语', 93);
  19. INSERT INTO t_score VALUES ('马汉', '英语', 95);
  20. COMMIT;
  21. -- 查询数据
  22. SELECT * FROM t_score;
  23. # todo “成绩表”,记录了每个学生各科的成绩。现在要查找单科成绩高于该科目平均成绩的学生名单。

实现:

  1. -- t1求每个科目的平均值
  2. with t1 as ( select t_score.*, avg(score) over ( partition by course_name) avg
  3. from t_score)
  4. -- 过滤单科成绩高于该科目平均成绩的学生名单
  5. select course_name,student_name from t1 where score>avg;

2.低于平均薪水的雇员

  1. -- 创建员工表
  2. CREATE TABLE t_employee (
  3. employeeID INT,
  4. departmentID INT,
  5. salary INT
  6. );
  7. -- 插入数据
  8. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10001, 1, 60117);
  9. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10002, 2, 92102);
  10. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10003, 2, 86074);
  11. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10004, 1, 66596);
  12. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10005, 1, 66961);
  13. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10006, 2, 81046);
  14. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10007, 2, 94333);
  15. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10008, 1, 75286);
  16. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10009, 2, 85994);
  17. INSERT INTO t_employee (employeeID, departmentID, salary) VALUES (10010, 1, 76884);
  18. -- 现在公司要找出每个部门低于平均薪水的雇员,然后进行培训来提高雇员工作效率,从而提高雇员薪水。

sql实现:

  1. -- 现在公司要找出每个部门低于平均薪水的雇员,然后进行培训来提高雇员工作效率,从而提高雇员薪水。
  2. with t1 as (
  3. select t_employee.* ,avg(salary) over ( partition by departmentID ) avg from t_employee
  4. )
  5. select * from t1 where salary < avg order by departmentID;

第五类:连续性问题

1.连续3次为球队得分的球员名单

  1. -- 创建分数表,并为列名增加注释
  2. drop table t_score;
  3. CREATE TABLE t_score (
  4. team_name VARCHAR2(50),
  5. player_id INT,
  6. player_name VARCHAR2(50),
  7. score INT,
  8. score_time TIMESTAMP
  9. );
  10. COMMENT ON COLUMN t_score.team_name IS '球队名称';
  11. COMMENT ON COLUMN t_score.player_id IS '球员ID';
  12. COMMENT ON COLUMN t_score.player_name IS '球员姓名';
  13. COMMENT ON COLUMN t_score.score IS '得分';
  14. COMMENT ON COLUMN t_score.score_time IS '得分时间';
  15. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:00:00', 'YYYY-MM-DD HH24:MI:SS'));
  16. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:15:00', 'YYYY-MM-DD HH24:MI:SS'));
  17. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 1, TO_TIMESTAMP('2023-12-25 10:30:00', 'YYYY-MM-DD HH24:MI:SS'));
  18. INSERT INTO t_score VALUES ('洛杉矶湖人队', 3, '安东尼·戴维斯', 2, TO_TIMESTAMP('2023-12-25 10:32:00', 'YYYY-MM-DD HH24:MI:SS'));
  19. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 10:45:00', 'YYYY-MM-DD HH24:MI:SS'));
  20. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 3, TO_TIMESTAMP('2023-12-25 11:00:00', 'YYYY-MM-DD HH24:MI:SS'));
  21. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 2, TO_TIMESTAMP('2023-12-25 11:15:00', 'YYYY-MM-DD HH24:MI:SS'));
  22. INSERT INTO t_score VALUES ('洛杉矶湖人队', 23, '勒布朗·詹姆斯', 2, TO_TIMESTAMP('2023-12-25 11:30:00', 'YYYY-MM-DD HH24:MI:SS'));
  23. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:10:00', 'YYYY-MM-DD HH24:MI:SS'));
  24. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:25:00', 'YYYY-MM-DD HH24:MI:SS'));
  25. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 1, TO_TIMESTAMP('2023-12-25 10:40:00', 'YYYY-MM-DD HH24:MI:SS'));
  26. INSERT INTO t_score VALUES ('金州勇士队', 11, '克莱·汤普森', 2, TO_TIMESTAMP('2023-12-25 10:45:00', 'YYYY-MM-DD HH24:MI:SS'));
  27. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 2, TO_TIMESTAMP('2023-12-25 10:55:00', 'YYYY-MM-DD HH24:MI:SS'));
  28. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 2, TO_TIMESTAMP('2023-12-25 11:10:00', 'YYYY-MM-DD HH24:MI:SS'));
  29. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:25:00', 'YYYY-MM-DD HH24:MI:SS'));
  30. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:40:00', 'YYYY-MM-DD HH24:MI:SS'));
  31. INSERT INTO t_score VALUES ('金州勇士队', 30, '斯蒂芬·库里', 3, TO_TIMESTAMP('2023-12-25 11:55:00', 'YYYY-MM-DD HH24:MI:SS'));
  32. -- 查询数据
  33. SELECT * FROM t_score;
  34. -- 请你写一个SQL语句,统计出连续3次为球队得分的球员名单。

sql实现:

第一种方法:  lead() 适用小范围连续

  1. with t1 as ( SELECT t_score.*,
  2. -- 向下拿一行
  3. lead(player_id,1) over ( partition by team_name order by score_time) rn1,
  4. -- 向下拿两行
  5. lead(player_id,1) over ( partition by team_name order by score_time) rn2
  6. FROM t_score)
  7. -- 过滤连续三次的id
  8. select distinct player_name,player_id,team_name from t1
  9. where player_id = rn1 and rn1 = rn2;

第二种方法:等差数列 通用

分析:

  1. with t1 as(
  2. select
  3. Cinema.*,
  4. row_number() over (partition by free order by seat_id) as rn1,
  5. seat_id - (row_number() over (partition by free order by seat_id)) as 差值
  6. from Cinema),
  7. t2 as (
  8. select seat_id,count(差值) over (partition by 差值) as 计数 from t1 where free=1)
  9. select seat_id from t2 where 计数>1 order by seat_id;

2.连续空余座位

方法1:

  1. Create table Cinema
  2. (
  3. seat_id number primary key,
  4. free varchar(2)
  5. );
  6. insert into Cinema (seat_id, free)
  7. values ('1', '1');
  8. insert into Cinema (seat_id, free)
  9. values ('2', '0');
  10. insert into Cinema (seat_id, free)
  11. values ('3', '1');
  12. insert into Cinema (seat_id, free)
  13. values ('4', '1');
  14. insert into Cinema (seat_id, free)
  15. values ('5', '1');
  16. -- 查找连续2个及以上座位可用的座位号
  17. select *
  18. from cinema;
  19. with t1 as ( select Cinema.*,
  20. lead(free,1) over ( order by seat_id) rn1,
  21. lag(free,1) over ( order by seat_id) rn2
  22. from Cinema)
  23. select seat_id from t1
  24. where free = 1 and rn1 = 1
  25. -- 当前为1,rn1=null,上一行也为1 --当前座位也是空的
  26. or(free=1 and rn1 is null and rn2=1)

方法2:

  1. with t1 as(
  2. select
  3. Cinema.*,
  4. row_number() over (partition by free order by seat_id) as rn1,
  5. seat_id - (row_number() over (partition by free order by seat_id)) as 差值
  6. from Cinema),
  7. t2 as (
  8. select seat_id,count(差值) over (partition by 差值) as 计数 from t1 where free=1)
  9. select seat_id from t2 where 计数>1 order by seat_id;

3.连续出现的数字

  1. Create table Logs (id int, num int);
  2. Truncate table Logs;
  3. insert into Logs (id, num) values ('1', '1');
  4. insert into Logs (id, num) values ('2', '1');
  5. insert into Logs (id, num) values ('3', '1');
  6. insert into Logs (id, num) values ('4', '2');
  7. insert into Logs (id, num) values ('5', '1');
  8. insert into Logs (id, num) values ('6', '2');
  9. insert into Logs (id, num) values ('7', '2');
  10. select * from logs;
  11. # todo 需求: 找出所有至少连续出现三次的数字。
  12. # todo 返回的结果表中的数据可以按 任意顺序 排列。

sql实现:

  1. -- 方法1:
  2. with t1 as (
  3. select
  4. Logs.*,
  5. id - (row_number() over (partition by num order by id)) 差值
  6. from Logs),
  7. t2 as (
  8. select num,count(1) over (partition by 差值,num) as 计数 from t1)
  9. select distinct num from t2 where 计数>=3;
  10. 方法2 lag lead
  11. with t1 as (
  12. select
  13. Logs.*,
  14. lag(num) over (order by id) lag1,
  15. lead(num) over (order by id) lead1
  16. from Logs)
  17. select distinct num from t1 where t1.num=lag1 and t1.num=lead1;

运行结果:

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小舞很执着/article/detail/919364
推荐阅读
相关标签
  

闽ICP备14008679号