赞
踩
目录
一、使用explode函数将hive表中的Map和Array字段数据进行拆分
6、使用cube 和ROLLUP 根据GROUP BY的维度的所有组合进行聚合。
lateral view用于和split、explode等UDTF一起使用的,能将一行数据拆分成多行数据,在此基础上可以对拆分的数据进行聚合,lateral view首先为原始表的每行调用UDTF,UDTF会把一行拆分成一行或者多行,lateral view再把结果组合,产生一个支持别名表的虚拟表。
其中explode还可以用于将hive一列中复杂的array或者map结构拆分成多行
需求:现在有数据格式如下
- zhangsan child1,child2,child3,child4 k1:v1,k2:v2
- lisi child5,child6,child7,child8 k3:v3,k4:v4
字段之间使用\t分割,需求将所有的child进行拆开成为一列
- +----------+--+
- | mychild |
- +----------+--+
- | child1 |
- | child2 |
- | child3 |
- | child4 |
- | child5 |
- | child6 |
- | child7 |
- | child8 |
- +----------+--+
将map的key和value也进行拆开,成为如下结果
- +-------------------+---------------+
- | mymapkey | mymapvalue |
- +----------------+------------------+
- | k1 | v1 |
- | k2 | v2 |
- | k3 | v3 |
- | k4 | v4 |
- +-------------------+---------------+
第一步:创建hive数据库
创建hive数据库
- hive (default)> create database hive_explode;
- hive (default)> use hive_explode;
第二步:创建hive表,然后使用explode拆分map和array
- hive (hive_explode)> create table t3(name string,children array<string>,address Map<string,string>)
- row format delimited fields terminated by '\t'
- collection items terminated by ','
- map keys terminated by ':' stored as textFile;
第三步:加载数据
node03执行以下命令创建表数据文件
- mkdir -p /export/servers/hivedatas/
- cd /export/servers/hivedatas/
- vim maparray
- zhangsan child1,child2,child3,child4 k1:v1,k2:v2
- lisi child5,child6,child7,child8 k3:v3,k4:v4
hive表当中加载数据
hive (hive_explode)> load data local inpath '/export/servers/hivedatas/maparray' into table t3;
第四步:使用explode将hive当中数据拆开
将array当中的数据拆分开
hive (hive_explode)> SELECT explode(children) AS myChild FROM t3;
将map当中的数据拆分开
hive (hive_explode)> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;
- 0: jdbc:hive2://node03:10000> select explode(children) as mychild from t3;
-
- +----------+--+
- | mychild |
- +----------+--+
- | child1 |
- | child2 |
- | child3 |
- | child4 |
- | child5 |
- | child6 |
- | child7 |
- | child8 |
- +----------+--+
- 8 rows selected (0.128 seconds)
- 0: jdbc:hive2://node03:10000> select name,explode(children) as mychild from t3;
- Error: Error while compiling statement: FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions (state=42000,code=10081)
- 0: jdbc:hive2://node03:10000> SELECT explode(address) AS (myMapKey, myMapValue) FROM t3;
-
- +-----------+-------------+--+
- | mymapkey | mymapvalue |
- +-----------+-------------+--+
- | k1 | v1 |
- | k2 | v2 |
- | k3 | v3 |
- | k4 | v4 |
- +-----------+-------------+--+
需求:现在有一些数据格式如下:
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]
其中字段与字段之间的分隔符是 “ | ”
我们要解析得到所有的monthSales对应的值为以下这一列(行转列)
4900
2090
6987
第一步:创建hive表
- hive (hive_explode)> create table explode_lateral_view
- (`area` string,
- `goods_id` string,
- `sale_info` string)
- ROW FORMAT DELIMITED
- FIELDS TERMINATED BY '|'
- STORED AS textfile;
第二步:准备数据并加载数据
准备数据如下
- cd /export/servers/hivedatas
- vim explode_json
a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]
加载数据到hive表当中去
hive (hive_explode)> load data local inpath '/export/servers/hivedatas/explode_json' overwrite into table explode_lateral_view;
第三步:使用explode拆分Array
hive (hive_explode)> select explode(split(goods_id,',')) as goods_id from explode_lateral_view;
第四步:使用explode拆解Map
hive (hive_explode)> select explode(split(area,',')) as area from explode_lateral_view;
第五步:拆解json字段
hive (hive_explode)> select explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')) as sale_info from explode_lateral_view;
然后我们想用get_json_object来获取key为monthSales的数据:
hive (hive_explode)> select get_json_object(explode(split(regexp_replace(regexp_replace(sale_info,'\\[\\{',''),'}]',''),'},\\{')),'$.monthSales') as sale_info from explode_lateral_view;
然后出现异常FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
UDTF explode不能写在别的函数内
如果你这么写,想查两个字段,select explode(split(area,',')) as area,good_id from explode_lateral_view;
会报错FAILED: SemanticException 1:40 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'good_id'
使用UDTF的时候,只支持一个字段,这时候就需要LATERAL VIEW出场了
配合lateral view查询多个字段
hive (hive_explode)> select goods_id2,sale_info from explo
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。