当前位置:   article > 正文

Hive窗口函数案例_jack,2017-01-01、

jack,2017-01-01、

数据准备:
name,orderdate,cost

jack,2017-01-01,10 
tony,2017-01-02,15 
jack,2017-02-03,23 
tony,2017-01-04,29 
jack,2017-01-05,46 
jack,2017-04-06,42 
tony,2017-01-07,50 
jack,2017-01-08,55 
mart,2017-04-08,62 
mart,2017-04-09,68 
neil,2017-05-10,12 
mart,2017-04-11,75 
neil,2017-06-12,80 
mart,2017-04-13,94
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

需求:
(1)查询在 2017 年 4 月份购买过的顾客及总人数
(2)查询顾客的购买明细及月购买总额
(3)上述的场景,要将 cost 按照日期进行累加
(4)查询每个顾客上次的购买时间
(5)查询前 20%时间的订单信息

建表并导入数据:

create table business( 
name string,
orderdate string,
cost int ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 

load data local inpath "/opt/module/datas/business.txt" into table business
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

(1)查询在 2017 年 4 月份购买过的顾客及总人数:

select name,count(*) over()
from business
where month(orderdate)='04'
group by name;
  • 1
  • 2
  • 3
  • 4

结果:

mart	2
jack	2
  • 1
  • 2

(2)查询顾客的购买明细及月购买总额

select name,orderdate,cost,sum(cost) over(partition by name,month(orderdate))
from business
group by name,orderdate,cost;
  • 1
  • 2
  • 3

结果:

jack	2017-01-05	46	111
jack	2017-01-08	55	111
jack	2017-01-01	10	111
jack	2017-02-03	23	23
jack	2017-04-06	42	42
mart	2017-04-09	68	299
mart	2017-04-13	94	299
mart	2017-04-11	75	299
mart	2017-04-08	62	299
neil	2017-05-10	12	12
neil	2017-06-12	80	80
tony	2017-01-02	15	94
tony	2017-01-04	29	94
tony	2017-01-07	50	94
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

(3)上述的场景,要将 cost 按照日期进行累加

select name,orderdate,cost,sum(cost) over(distribute by name sort by cost)
from business;
  • 1
  • 2

结果:

jack	2017-01-05	46	111
jack	2017-01-08	55	111
jack	2017-01-01	10	111
jack	2017-02-03	23	23
jack	2017-04-06	42	42
mart	2017-04-09	68	299
mart	2017-04-13	94	299
mart	2017-04-11	75	299
mart	2017-04-08	62	299
neil	2017-05-10	12	12
neil	2017-06-12	80	80
tony	2017-01-02	15	94
tony	2017-01-04	29	94
tony	2017-01-07	50	94
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

(4)查询每个顾客上次的购买时间

select name,orderdate,lag(orderdate,1,'2000-01-01') over(distribute by name sort by orderdate)
from business;
  • 1
  • 2

结果:

jack	2017-01-01	2000-01-01
jack	2017-01-05	2017-01-01
jack	2017-01-08	2017-01-05
jack	2017-02-03	2017-01-08
jack	2017-04-06	2017-02-03
mart	2017-04-08	2000-01-01
mart	2017-04-09	2017-04-08
mart	2017-04-11	2017-04-09
mart	2017-04-13	2017-04-11
neil	2017-05-10	2000-01-01
neil	2017-06-12	2017-05-10
tony	2017-01-02	2000-01-01
tony	2017-01-04	2017-01-02
tony	2017-01-07	2017-01-04
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

(5)查询前 20%时间的订单信息

select * from 
(select name,orderdate,cost,ntile(5) over(order by orderdate) as t 
from business) t1
where t=1
  • 1
  • 2
  • 3
  • 4

结果:

jack	2017-01-01	10	1
tony	2017-01-02	15	1
tony	2017-01-04	29	1
  • 1
  • 2
  • 3
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/盐析白兔/article/detail/656551
推荐阅读
相关标签
  

闽ICP备14008679号