当前位置:   article > 正文

Hive经典面试问题(四)——交叉重复问题_hive怎么解决数据重复的问题

hive怎么解决数据重复的问题

交叉重复问题

题目

在这里插入图片描述

数据

oppo	2021-06-05	2021-06-09
oppo	2021-06-11	2021-06-21
VIVO	2021-06-05	2021-06-15
VIVO	2021-06-09	2021-06-21
redmi	2021-06-05	2021-06-21
redmi	2021-06-09	2021-06-15
redmi	2021-06-17	2021-06-26
huawei	2021-06-05	2021-06-26
huawei	2021-06-09	2021-06-15
huawei	2021-06-17	2021-06-21
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

建表

create table if not exists intersection(
name string,
sdt date,
edt date
)row format delimited fields terminated by ‘\t’;

加载数据

load data local inpath ‘/opt/module/data/hive-interviews/intersection’ into table intersection;

解题步骤

解题步骤1 先将当前行以前的数据中最大的edt放置到当前行

sql语句

select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection;
  • 1
  • 2

结果

name	sdt	edt	maxedt
VIVO	2021-06-05	2021-06-15	NULL
VIVO	2021-06-09	2021-06-21	2021-06-15
huawei	2021-06-05	2021-06-26	NULL
huawei	2021-06-09	2021-06-15	2021-06-26
huawei	2021-06-17	2021-06-21	2021-06-26
oppo	2021-06-05	2021-06-09	NULL
oppo	2021-06-11	2021-06-21	2021-06-09
redmi	2021-06-05	2021-06-21	NULL
redmi	2021-06-09	2021-06-15	2021-06-21
redmi	2021-06-17	2021-06-26	2021-06-21
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

解题步骤2:比较sdt和maxEdt

如果maxEdt大,则替换,null不操作,maxEdt小不操作

sql语句

select 
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

结果

t1.name	sdt	t1.edt
VIVO	2021-06-05	2021-06-15
VIVO	2021-06-15	2021-06-21
huawei	2021-06-05	2021-06-26
huawei	2021-06-26	2021-06-15
huawei	2021-06-26	2021-06-21
oppo	2021-06-05	2021-06-09
oppo	2021-06-11	2021-06-21
redmi	2021-06-05	2021-06-21
redmi	2021-06-21	2021-06-15
redmi	2021-06-21	2021-06-26
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

解题步骤3 求每个的打折时间

sql语句

select
t2.name,datediff(t2.edt,t2.sdt) days
from(
select 
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
)t2
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10

结果

t2.name	days
VIVO	10
VIVO	6
huawei	21
huawei	-11
huawei	-5
oppo	4
oppo	10
redmi	16
redmi	-6
redmi	5
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

解题步骤4:求出最大打折时间

sql语句

select t3.name,sum(if(t3.days>0,t3.days,0))
from(
select
t2.name,datediff(t2.edt,t2.sdt) days
from(
select 
t1.name,if(t1.maxEdt is null,t1.sdt,if(t1.sdt>t1.maxEdt,t1.sdt,t1.maxEdt)) sdt,t1.edt
from(
select name,sdt,edt,max(edt) over(partition by name order by sdt rows between UNBOUNDED preceding and 1 preceding) maxEdt
from intersection
)t1
)t2
)t3
group by t3.name
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

结果

t3.name	_c1
VIVO	16
huawei	21
oppo	14
redmi	21
  • 1
  • 2
  • 3
  • 4
  • 5
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/656582
推荐阅读
相关标签
  

闽ICP备14008679号