赞
踩
已知有表 t_product_sales 如下,记录了每个产品id、产品名称、产品销售开始日期、产品销售结束日期以及产品日均销售金额,请计算出每个产品每年的销售金额。
样例数据
+-------------+---------------+----------------------+----------------------+----------------------+
| product_id | product_name | period_start | period_end | average_daily_sales |
+-------------+---------------+----------------------+----------------------+----------------------+
| 1 | LC Phone | 2019-01-25 00:00:00 | 2019-02-28 00:00:00 | 100 |
| 2 | LC T-Shirt | 2018-12-01 00:00:00 | 2020-01-01 00:00:00 | 10 |
| 3 | LC Keychain | 2019-12-01 00:00:00 | 2020-01-31 00:00:00 | 1 |
+-------------+---------------+----------------------+----------------------+----------------------+
期望结果
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1 | LC Phone | 2019 | 3500 |
| 2 | LC T-Shirt | 2018 | 310 |
| 2 | LC T-Shirt | 2019 | 3650 |
| 2 | LC T-Shirt | 2020 | 10 |
| 3 | LC Keychain | 2019 | 31 |
| 3 | LC Keychain | 2020 | 31 |
+------------+--------------+-------------+--------------+
--建表语句 create table if not exists t_product_sales ( product_id bigint, product_name string, period_start string, period_end string, average_daily_sales bigint ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS orc; --插入数据 insert into t_product_sales(product_id, product_name, period_start, period_end, average_daily_sales) values (1, 'LC Phone', '2019-01-25 00:00:00', '2019-02-28 00:00:00', 100), (2, 'LC T-Shirt', '2018-12-01 00:00:00', '2020-01-01 00:00:00', 10), (3, 'LC Keychain', '2019-12-01 00:00:00', '2020-01-31 00:00:00', 1);
(1)构建年份维表
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day
union all
select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day
union all
select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day)
select year,year_first_day,year_end_day from dim_year
执行结果
+-------+-----------------+---------------+
| year | year_first_day | year_end_day |
+-------+-----------------+---------------+
| 2018 | 2018-01-01 | 2018-12-31 |
| 2019 | 2019-01-01 | 2019-12-31 |
| 2020 | 2020-01-01 | 2020-12-31 |
+-------+-----------------+---------------+
(2)年份维表与原始数据进行关联
把原始数据与年份维表进行笛卡尔积,得到每年与原始数据的一个交叉值。
# 这一设置的含义是关闭Hive中的笛卡尔积严格检查。 # 具体来说,Hive默认情况下不允许生成笛卡尔积(即两个没有连接条件的表的笛卡尔积), # 因为笛卡尔积通常会生成非常大的结果集,可能会导致性能问题或资源耗尽。 # 通过将这个设置设为false,你可以禁用这种严格检查,从而允许在查询中生成笛卡尔积。 set hive.strict.checks.cartesian.product = false; with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day union all select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day union all select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day) select product_id, product_name, period_start, period_end, average_daily_sales, year, year_first_day, year_end_day from t_product_sales left join dim_year
执行结果
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| product_id | product_name | period_start | period_end | average_daily_sales | year | year_first_day | year_end_day |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| 1 | LC Phone | 2019-01-25 00:00:00 | 2019-02-28 00:00:00 | 100 | 2019 | 2019-01-01 | 2019-12-31 |
| 1 | LC Phone | 2019-01-25 00:00:00 | 2019-02-28 00:00:00 | 100 | 2020 | 2020-01-01 | 2020-12-31 |
| 1 | LC Phone | 2019-01-25 00:00:00 | 2019-02-28 00:00:00 | 100 | 2018 | 2018-01-01 | 2018-12-31 |
| 2 | LC T-Shirt | 2018-12-01 00:00:00 | 2020-01-01 00:00:00 | 10 | 2019 | 2019-01-01 | 2019-12-31 |
| 2 | LC T-Shirt | 2018-12-01 00:00:00 | 2020-01-01 00:00:00 | 10 | 2020 | 2020-01-01 | 2020-12-31 |
| 2 | LC T-Shirt | 2018-12-01 00:00:00 | 2020-01-01 00:00:00 | 10 | 2018 | 2018-01-01 | 2018-12-31 |
| 3 | LC Keychain | 2019-12-01 00:00:00 | 2020-01-31 00:00:00 | 1 | 2019 | 2019-01-01 | 2019-12-31 |
| 3 | LC Keychain | 2019-12-01 00:00:00 | 2020-01-31 00:00:00 | 1 | 2020 | 2020-01-01 | 2020-12-31 |
| 3 | LC Keychain | 2019-12-01 00:00:00 | 2020-01-31 00:00:00 | 1 | 2018 | 2018-01-01 | 2018-12-31 |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
(3)计算每年每个产品在售天数,计算年销售额
我们先观察2中的结果,可以看到原始记录中每行数据都与所有年都有一行记录。我们从销售日期和每年开始日期中取较大日期 得到一个开始时间,然后从销售截止日期和每年的结束日期取较小日期 得到一个结束日期,然后用结束日期减去开始日期。
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_day union all select '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_day union all select '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day), tmp as (select product_id , product_name , period_start , period_end , average_daily_sales , year , year_first_day , year_end_day , datediff(if(to_date(period_end) > to_date(year_end_day) , to_date(year_end_day) , to_date(period_end)) , if(to_date(period_start) > to_date(year_first_day) , to_date(period_start) , to_date(year_first_day))) as date_diff from t_product_sales left join dim_year) select product_id, product_name, year, (date_diff + 1) * average_daily_sales as total_amount from tmp where date_diff >= 0
执行结果
+-------------+---------------+-------+---------------+
| product_id | product_name | year | total_amount |
+-------------+---------------+-------+---------------+
| 1 | LC Phone | 2019 | 3500 |
| 2 | LC T-Shirt | 2018 | 310 |
| 2 | LC T-Shirt | 2019 | 3650 |
| 2 | LC T-Shirt | 2020 | 10 |
| 3 | LC Keychain | 2019 | 31 |
| 3 | LC Keychain | 2020 | 31 |
+-------------+---------------+-------+---------------+
只有在当年有销售时间的数据date_diff >=0,如果在当年没有销售时间,则date_diff为负。但是这个date_diff 的值比预期小1,这是因为我们算了日期差,所以我们在结果上+1即可。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。