赞
踩
hive中一般取top n时,都会用到row_number(),rank,dense_ran()这三个函数,row_number()、rank()、dense_rank()三个函数相当于group by之后再来个order by。
有三列数据,商品id、商品名称product、销售金额amount,数据如下
id product amount
1 a 10
2 a 12
3 b 13
4 b 12
5 a 14
6 a 15
7 a 13
8 b 11
9 a 16
10 b 17
11 a 14
使用窗口函数
select
id,
product,
amount,
rank()over(partition by product order by amount desc ) as rk,
dense_rank() over(partition by product order by amount desc ) as drk,
row_number()over(partition by product order by amount desc) as rn
from temp;
输出结果
ID NAME SAL rk drk rn
10 b 17 1 1 1
3 b 13 2 2 2
4 b 12 3 3 3
8 b 11 4 4 4
9 a 16 1 1 1
6 a 15 2 2 2
11 a 14 3 3 3
5 a 14 3 3 4
7 a 13 5 4 5
2 a 12 6 5 6
1 a 10 7 6 7
结论
rank():对amount排序时,如果amount有重复值,rank值也会重复,但总数不会变
dense_rank():对amount排序时,如果amount有重复值时,rank值也会重复,但总数会变少
row_number():对amount排序时,如果amount有重复值时,rank值不会重复,但总数不变
对例1的数据取销售金额amount前50%的商品
select
t.id,t.product,
t.amount,
t.rk,
t.drk,
t.rn
from (
select
id,
product,
amount,
rank() over(partition by product order by amount desc ) rk,
dense_rank() over(partition by product order by amount desc ) drk,
row_number() over(partition by product order by amount desc) rn,
count(*) over(partition by product)*0.5 as count
from temp
) t where t.rk < t.count;
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。