赞
踩
假设你是一位数据分析师,负责分析某电商平台的用户行为和订单数据,平台上有多个用户,用户可以在不同的日期下单,每个订单包含多个商品。请你完成相关业务分析,帮助平台优化运营策略和用户体验。
我们有三张表,表的字段信息如下:
users
表
记录用户的信息
字段名 | 字段类型 | 备注 |
---|---|---|
user_id | int | 用户ID |
name | string | 用户姓名 |
age | int | 用户年龄 |
gender | string | 用户性别 |
register_date | string | 注册日期 |
示例数据:
user_id | name | age | gender | register_date |
---|---|---|---|---|
1 | ‘Alice’ | 23 | ‘F’ | ‘2023-01-01’ |
2 | ‘Bob’ | 22 | ‘M’ | ‘2023-02-01’ |
3 | ‘Cathy’ | 24 | ‘F’ | ‘2023-03-01’ |
4 | ‘David’ | 23 | ‘M’ | ‘2023-04-01’ |
5 | ‘Eve’ | 25 | ‘F’ | ‘2023-05-01’ |
orders
表
记录订单的信息
字段名 | 字段类型 | 备注 |
---|---|---|
order_id | int | 订单ID |
user_id | int | 用户ID |
order_date | string | 订单日期 |
amount | double | 订单金额 |
示例数据:
order_id | user_id | order_date | amount |
---|---|---|---|
101 | 1 | ‘2024-01-01’ | 100.0 |
102 | 1 | ‘2024-01-02’ | 150.0 |
103 | 2 | ‘2024-01-03’ | 200.0 |
104 | 3 | ‘2024-01-04’ | 50.0 |
105 | 4 | ‘2024-01-05’ | 300.0 |
106 | 5 | ‘2024-01-06’ | 250.0 |
order_items
表
记录订单中商品的信息
字段名 | 字段类型 | 备注 |
---|---|---|
order_item_id | int | 订单项ID |
order_id | int | 订单ID |
product_id | int | 商品ID |
quantity | int | 商品数量 |
price | double | 商品单价 |
示例数据:
order_item_id | order_id | product_id | quantity | price |
---|---|---|---|---|
1001 | 101 | 1 | 1 | 50.0 |
1002 | 101 | 2 | 1 | 50.0 |
1003 | 102 | 3 | 3 | 50.0 |
1004 | 103 | 4 | 2 | 100.0 |
1005 | 104 | 5 | 1 | 50.0 |
1006 | 105 | 6 | 2 | 150.0 |
1007 | 106 | 7 | 5 | 50.0 |
数据集
-- 创建用户表
CREATE TABLE users (
user_id INT,
name STRING,
age INT,
gender STRING,
register_date STRING
);
-- 插入用户数据
INSERT INTO users VALUES
(1, 'Alice', 23, 'F', '2023-01-01'),
(2, 'Bob', 22, 'M', '2023-02-01'),
(3, 'Cathy', 24, 'F', '2023-03-01'),
(4, 'David', 23, 'M', '2023-04-01'),
(5, 'Eve', 25, 'F', '2023-05-01'),
(6, 'Frank', 28, 'M', '2023-06-01'),
(7, 'Grace', 27, 'F', '2023-07-01'),
(8, 'Hank', 26, 'M', '2023-08-01'),
(9, 'Ivy', 29, 'F', '2023-09-01'),
(10, 'Jack', 30, 'M', '2023-10-01');
-- 创建订单表
CREATE TABLE orders (
order_id INT,
user_id INT,
order_date STRING,
amount DOUBLE
);
-- 插入订单数据
INSERT INTO orders VALUES
(101, 1, '2024-01-01', 100.0),
(102, 1, '2024-01-02', 150.0),
(103, 2, '2024-01-03', 200.0),
(104, 3, '2024-01-04', 50.0),
(105, 4, '2024-01-05', 300.0),
(106, 5, '2024-01-06', 250.0),
(107, 6, '2024-01-07', 100.0),
(108, 7, '2024-01-08', 150.0),
(109, 8, '2024-01-09', 200.0),
(110, 9, '2024-01-10', 50.0),
(111, 10, '2024-01-11', 300.0),
(112, 1, '2024-01-12', 100.0),
(113, 2, '2024-01-13', 150.0),
(114, 3, '2024-01-14', 200.0),
(115, 4, '2024-01-15', 50.0),
(116, 5, '2024-01-16', 300.0),
(117, 6, '2024-01-17', 250.0),
(118, 7, '2024-01-18', 100.0),
(119, 8, '2024-01-19', 150.0),
(120, 9, '2024-01-20', 200.0);
-- 创建订单项表
CREATE TABLE order_items (
order_item_id INT,
order_id INT,
product_id INT,
quantity INT,
price DOUBLE
);
-- 插入订单项数据
INSERT INTO order_items VALUES
(1001, 101, 1, 1, 50.0),
(1002, 101, 2, 1, 50.0),
(1003, 102, 3, 3, 50.0),
(1004, 103, 4, 2, 100.0),
(1005, 104, 5, 1, 50.0),
(1006, 105, 6, 2, 150.0),
(1007, 106, 7, 5, 50.0),
(1008, 107, 1, 2, 50.0),
(1009, 108, 2, 3, 50.0),
(1010, 109, 3, 1, 50.0),
(1011, 110, 4, 2, 100.0),
(1012, 111, 5, 1, 50.0),
(1013, 112, 6, 2, 150.0),
(1014, 113, 7, 5, 50.0),
(1015, 114, 1, 2, 50.0),
(1016, 115, 2, 3, 50.0),
(1017, 116, 3, 1, 50.0),
(1018, 117, 4, 2, 100.0),
(1019, 118, 5, 1, 50.0),
(1020, 119, 6, 2, 150.0),
(1021, 120, 7, 5, 50.0);
select
user_id,
total_amount,
total_cnt,
rank() over(order by total_amount desc) rk
from
(select
user_id,
sum(amount) total_amount,
count(order_id) total_cnt
from
orders
group by
user_id)t1;
解题思路
RANK()
窗口函数对用户进行排名。select
user_id,
date_format(order_date,"yyyy-MM") order_month,
sum(amount) total_amount,
count(order_id) total_cnt
from
orders
group by
user_id,date_format(order_date,"yyyy-MM");
解题思路
DATE_FORMAT()
函数按月提取订单日期;select
user_id,
product_id,
total_amount,
total_cnt
from
(select
user_id,
product_id,
total_amount,
total_cnt,
row_number() over(partition by user_id order by total_cnt desc) rn
from
(select
user_id,
product_id,
sum(amount) total_amount,
count(product_id) total_cnt
from
orders o
join
order_items oi
on
o.order_id = oi.order_id
group by
user_id,product_id)t1 )t2
where
rn = 1;
解题思路
ROW_NUMBER()
窗口函数对每个用户购买的商品进行排序;select
user_id,
order_month,
avg_amount,
order_cnt
from
(select
user_id,
order_month,
avg_amount,
order_cnt,
rank() over(order by avg_amount desc) rk_amount,
rank() over(order by order_cnt desc) rk_cnt
from
(select
user_id,
date_format(order_date,"yyyy-MM") order_month,
cast(avg(amount) as decimal(5,2)) avg_amount,
count(order_id) order_cnt
from
orders
group by
user_id,
date_format(order_date,"yyyy-MM"))t1 )t2
where
rk_amount = 1 or rk_cnt = 1;
解题思路
统计每个用户每月的平均订单金额和订单数量;
排序并取出平均每月订单金额和订单数量最高的两个用户。
select
u.user_id,
total_amount,
u.age,
u.gender
from
(select
user_id,
sum(amount) total_amount
from
orders
group by
user_id) o
join
users u
on
o.user_id = u.user_id
order by
total_amount desc
limit
10;
解题思路
users
表,获取用户的年龄和性别信息;select
u.user_id,
sum(amount) total_amount
from
(select
user_id
from
users
where
register_date >= date_sub(current_date(),365) )u
join
orders o
on
o.user_id = u.user_id
group by
u.user_id
order by
total_amount desc
limit
5;
解题思路
orders
表,统计这些用户的总订单金额;5
名用户,如果想要精准获取允许重复,则可以使用 rank
或者 row_number
进行窗口排序后过滤获取前 5
。Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。