当前位置:   article > 正文

大数据经典sql

大数据经典sql

一、连续登录问题

问题:1)、每个用户连续登录最大天数

            2)、连续登录大于三天的用户数

分析:本质都是计算用户连续登录天数

方案一:利用排序窗口

  1. select a.user_id
  2. ,a.date_rslt
  3. ,count(1) as cnt
  4. from (
  5. select
  6. t.user_id
  7. ,t.login_time
  8. ,date_sub(login_time, num) as date_rslt
  9. from (
  10. select
  11. user_id
  12. ,login_time
  13. ,row_number() over(partition by user_id order by login_time) as num
  14. from login_log
  15. ) t
  16. ) a
  17. group by a.user_id,a.date_rslt

方案二、增量加全量

连续访问天数v_days(最新flag值为1,则v_days累加,否则为0)

历史最大访问天数max_days (从max_days、v_days中取最大值)

  1. select
  2. coaleasce(h.user_id,i.user_id) as user_id,
  3. if(i.user_id is not null,v_days+1,0) as v_days,
  4. greatest(max_days,if(i.user_id is not null,v_days+1,0)) as max_days
  5. from
  6. history_ds h
  7. full join
  8. log_time i

扩展1:连续登录,中间间隔1天也算

  1. select user_id
  2. ,group_id
  3. ,count(login_date) as continuous_login_days
  4. from (
  5. select
  6. user_id
  7. ,login_date
  8. ,sum(if(date_diff>1,1,0)) over(partition by user_id order by login_date rows between unboundedpreceding and current row) as group_id
  9. from (
  10. select
  11. user_id
  12. ,login_date
  13. ,datediff(login_date,last_login_date) as date_diff
  14. from (
  15. select
  16. user_id
  17. ,login_date
  18. ,lag(login_date,1,'1970-01-01') over(partition by user_id order by login_date) as last_login_date
  19. from test_login
  20. )t1
  21. )t2
  22. )t3
  23. group by user_id
  24. ,group_id;

扩展2:断点排序

连续日期的数据对应的值发生变化,重新排序

  1. select
  2. a,
  3. b,
  4. row_number() over( partition by b,a_diff order by a) as c
  5. from
  6. (
  7. select
  8. a,
  9. b,
  10. a-num as a_diff
  11. from
  12. (
  13. select
  14. a,
  15. b,
  16. row_number() over( partition by b order by a ) as num
  17. from t1
  18. )tmp1
  19. )tmp2
  20. order by a;

大数据面试题之SQL题_大数据sql面试题-CSDN博客

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/一键难忘520/article/detail/745328
推荐阅读
相关标签
  

闽ICP备14008679号