当前位置:   article > 正文

SQL 经典面试题:统计最近七天连续三天活跃的用户_sql查询连续三天活跃用户

sql查询连续三天活跃用户

1 需求

给定 mid,dt 的用户登录记录表,查找最近 7 天内连续 3 天活跃的用户 id

2 数据表

tmp_table.tmp_login_test

CREATE TABLE tmp_table.tmp_login_test (
    mid string,
    dt string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
  • 1
  • 2
  • 3
  • 4
insert into tmp_table.tmp_login_test values
("zhangsan","2021-08-01"),
("zhangsan","2021-08-02"),
("zhangsan","2021-08-04"),
("zhangsan","2021-08-05"),
("zhangsan","2021-08-06"),
("zhangsan","2021-08-08"),
("zhangsan","2021-08-09"),
("zhangsan","2021-08-10"),
("lisi","2021-08-01"),
("lisi","2021-08-02"),
("lisi","2021-08-04"),
("lisi","2021-08-05"),
("lisi","2021-08-08"),
("lisi","2021-08-10"),
("wangwu","2021-08-01"),
("wangwu","2021-08-04"),
("wangwu","2021-08-07"),
("zhaoliu","2021-08-01"),
("zhaoliu","2021-08-02"),
("zhaoliu","2021-08-03"),
("zhaoliu","2021-08-04"),
("zhaoliu","2021-08-05"),
("zhaoliu","2021-08-06"),
("zhaoliu","2021-08-07"),
("zhaoliu","2021-08-08"),
("zhaoliu","2021-08-09"),
("zhaoliu","2021-08-10");
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28

3 Sql 实现

实现思路:获取最近 7 天的用户登录记录数据(在 where 中限定),对数据进行 rank 排序,计算登录日期与 rank 值之间的差值(使用 date_sub 函数)得到一个差值日期,如果登录日期是连续的那么计算得到的差值日期是同一个,在此基础上基于用户,差值日期分组,统计 dt 的去重数量,即可得到每个用户每次连续登录的天数。在本例中,需要统计 7 天内连续 3 天登录的用户,所以只需要取出连续登录天数大于等于 3 的 uid 即完成需求。

  • 对用户的登录行为按 mid 分组,组内按登录日期进行排序
select 
    mid, dt, 
    rank() over(partition by mid order by dt) rank_mid_dt 
from tmp_table.tmp_login_test 
where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10';
  • 1
  • 2
  • 3
  • 4
  • 5
+-----------+-------------+--------------+
|    mid    |     dt      | rank_mid_dt  |
+-----------+-------------+--------------+
| lisi      | 2021-08-04  | 1            |
| lisi      | 2021-08-05  | 2            |
| lisi      | 2021-08-08  | 3            |
| lisi      | 2021-08-10  | 4            |
| zhangsan  | 2021-08-04  | 1            |
| zhangsan  | 2021-08-05  | 2            |
| zhangsan  | 2021-08-06  | 3            |
| zhangsan  | 2021-08-08  | 4            |
| zhangsan  | 2021-08-09  | 5            |
| zhangsan  | 2021-08-10  | 6            |
| zhaoliu   | 2021-08-04  | 1            |
| zhaoliu   | 2021-08-05  | 2            |
| zhaoliu   | 2021-08-06  | 3            |
| zhaoliu   | 2021-08-07  | 4            |
| zhaoliu   | 2021-08-08  | 5            |
| zhaoliu   | 2021-08-09  | 6            |
| zhaoliu   | 2021-08-10  | 7            |
| wangwu    | 2021-08-04  | 1            |
| wangwu    | 2021-08-07  | 2            |
+-----------+-------------+--------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 基于上表计算 dt 和 rank_mid_dt 日期差
select 
    mid, dt, rank_mid_dt, 
    date_sub(dt, rank_mid_dt) date_diff 
from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1;
  • 1
  • 2
  • 3
  • 4
+-----------+-------------+--------------+-------------+
|    mid    |     dt      | rank_mid_dt  |  date_diff  |
+-----------+-------------+--------------+-------------+
| lisi      | 2021-08-04  | 1            | 2021-08-03  |
| lisi      | 2021-08-05  | 2            | 2021-08-03  |
| lisi      | 2021-08-08  | 3            | 2021-08-05  |
| lisi      | 2021-08-10  | 4            | 2021-08-06  |
| zhangsan  | 2021-08-04  | 1            | 2021-08-03  |
| zhangsan  | 2021-08-05  | 2            | 2021-08-03  |
| zhangsan  | 2021-08-06  | 3            | 2021-08-03  |
| zhangsan  | 2021-08-08  | 4            | 2021-08-04  |
| zhangsan  | 2021-08-09  | 5            | 2021-08-04  |
| zhangsan  | 2021-08-10  | 6            | 2021-08-04  |
| zhaoliu   | 2021-08-04  | 1            | 2021-08-03  |
| zhaoliu   | 2021-08-05  | 2            | 2021-08-03  |
| zhaoliu   | 2021-08-06  | 3            | 2021-08-03  |
| zhaoliu   | 2021-08-07  | 4            | 2021-08-03  |
| zhaoliu   | 2021-08-08  | 5            | 2021-08-03  |
| zhaoliu   | 2021-08-09  | 6            | 2021-08-03  |
| zhaoliu   | 2021-08-10  | 7            | 2021-08-03  |
| wangwu    | 2021-08-04  | 1            | 2021-08-03  |
| wangwu    | 2021-08-07  | 2            | 2021-08-05  |
+-----------+-------------+--------------+-------------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 基于 mid,date_diff 分组,统计 dt 的去重数量,并取出数量大于 3 的
select 
    mid, date_diff, 
    count(distinct dt) cnt 
from (select mid, dt, rank_mid_dt, date_sub(dt, rank_mid_dt) date_diff from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1) t2 
group by mid, date_diff having count(distinct dt) >= 3;
  • 1
  • 2
  • 3
  • 4
  • 5
+-----------+-------------+------+
|    mid    |  date_diff  | cnt  |
+-----------+-------------+------+
| zhaoliu   | 2021-08-03  | 7    |
| zhangsan  | 2021-08-03  | 3    |
| zhangsan  | 2021-08-04  | 3    |
+-----------+-------------+------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 取出 mid
select 
    distinct mid
from (select mid, date_diff, count(distinct dt) cnt from (select mid, dt, rank_mid_dt, date_sub(dt, rank_mid_dt) date_diff from (select mid, dt, rank() over(partition by mid order by dt) rank_mid_dt from tmp_table.tmp_login_test where dt >= date_sub('2021-08-10', 6) and dt <= '2021-08-10') t1) t2 
group by mid, date_diff having count(distinct dt) >= 3) t3;
  • 1
  • 2
  • 3
  • 4
+-----------+
|    mid    |
+-----------+
| zhangsan  |
| zhaoliu   |
+-----------+
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/煮酒与君饮/article/detail/806908
推荐阅读
相关标签
  

闽ICP备14008679号