赞
踩
题目来源:滴滴
已知有用户登录记录表包含登录日期和登录用户ID,请查询出截止到当前日期累积登录用户数及登陆用户列表。
样例数据
+-------------+----------+
| log_date | user_id |
+-------------+----------+
| 2024-01-01 | a |
| 2024-01-02 | a |
| 2024-01-02 | b |
| 2024-01-03 | b |
| 2024-01-04 | c |
| 2024-01-05 | b |
| 2024-01-05 | c |
| 2024-01-05 | d |
| 2024-01-05 | e |
+-------------+----------+
期望结果
+-------------+-----------+------------------------+
| log_date | user_cnt | user_list |
+-------------+-----------+------------------------+
| 2024-01-01 | 1 | ["a"] |
| 2024-01-02 | 2 | ["a","b"] |
| 2024-01-03 | 2 | ["a","b"] |
| 2024-01-04 | 3 | ["a","b","c"] |
| 2024-01-05 | 5 | ["a","b","c","d","e"] |
+-------------+-----------+------------------------+
-- 建表语句 CREATE TABLE t_user_login ( log_date STRING, user_id STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; -- 数据插入语句 INSERT INTO t_user_login (log_date, user_id) VALUES ('2024-01-01','a'), ('2024-01-02','a'), ('2024-01-02','b'), ('2024-01-03','b'), ('2024-01-04','c'), ('2024-01-05','b'), ('2024-01-05','c'), ('2024-01-05','d'), ('2024-01-05','e');
(1)数据去重聚合
select log_date,
user_id,
count(distinct user_id) over (order by log_date asc) as user_cnt,
collect_set(user_id) over (order by log_date asc) as user_list
from t_user_login
执行结果
+-------------+----------+-----------+------------------------+
| log_date | user_id | user_cnt | user_list |
+-------------+----------+-----------+------------------------+
| 2024-01-01 | a | 1 | ["a"] |
| 2024-01-02 | b | 2 | ["a","b"] |
| 2024-01-02 | a | 2 | ["a","b"] |
| 2024-01-03 | b | 2 | ["a","b"] |
| 2024-01-04 | c | 3 | ["a","b","c"] |
| 2024-01-05 | e | 5 | ["a","b","c","e","d"] |
| 2024-01-05 | d | 5 | ["a","b","c","e","d"] |
| 2024-01-05 | c | 5 | ["a","b","c","e","d"] |
| 2024-01-05 | b | 5 | ["a","b","c","e","d"] |
+-------------+----------+-----------+------------------------+
(2)分组得到结果
select log_date,
user_cnt,
user_list
from (select log_date,
user_id,
count(distinct user_id) over (order by log_date asc) as user_cnt,
sort_array(collect_set(user_id) over (order by log_date asc)) as user_list
from t_user_login) t
group by log_date, user_cnt, user_list
执行结果
+-------------+-----------+------------------------+
| log_date | user_cnt | user_list |
+-------------+-----------+------------------------+
| 2024-01-01 | 1 | ["a"] |
| 2024-01-02 | 2 | ["a","b"] |
| 2024-01-03 | 2 | ["a","b"] |
| 2024-01-04 | 3 | ["a","b","c"] |
| 2024-01-05 | 5 | ["a","b","c","d","e"] |
+-------------+-----------+------------------------+
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。