当前位置:   article > 正文

Hive SQL——explode拆分函数&多行(列)合并为一行(列)&reflect函数_hive多列合并成一列

hive多列合并成一列

一、拆分 map 和 array

1.执行Linux命令

cd /data/import/
sudo vi test_explode_map_array.txt

  • 添加以下文件内容

小明    产品1,产品2,产品3    性别:男,年龄:24
小花    产品4,产品5,产品6    性别:女,年龄:22 

2.创建表并加载数据

  1. -- 开启智能本地模式
  2. -- set hive.exec.mode.local.auto=true;
  3. -- 创建表
  4. create table test.test_explode_map_array(
  5. name string,
  6. prod_arr array<string>,
  7. info_map map<string, string>)
  8. row format delimited
  9. -- 字段分隔符为'\t'
  10. fields terminated by '\t'
  11. collection items terminated by ','
  12. map keys terminated by ':'
  13. stored as textfile;
  14. -- 加载数据(方法一)
  15. load data local inpath '/data/import/test_explode_map_array.txt'
  16. into table test.test_explode_map_array;
  17. -- 加载数据(方法二)
  18. insert into test.test_explode_map_array
  19. select '小明', array('产品1', '产品2', '产品3'), str_to_map('性别:男,年龄:24');
  20. insert into test.test_explode_map_array
  21. select '小花', array('产品4', '产品5', '产品6'), str_to_map('性别:女,年龄:22');

3.查询结果1

map_keymap_value
年龄24
性别
年龄22
性别
  1. select
  2. -- explode拆分后必须加括号
  3. explode(info_map) as (map_key, map_value)
  4. from test.test_explode_map_array;
  5. -- 查询其他字段, lateral view(侧视图, 虚拟表)
  6. select
  7. name, map_key, map_value
  8. from test.test_explode_map_array
  9. -- 必须去掉括号map_key, map_value
  10. lateral view explode(info_map) tmp_table as map_key, map_value;

4.查询结果2

prod_arr_new
产品1
产品2
产品3
产品4
产品5
产品6
  1. select
  2. -- explode拆分数组
  3. explode(prod_arr) as prod_arr_new
  4. from test.test_explode_map_array;

5.查询结果3

nameprod_arr_new
小明产品1
小明产品2
小明产品3
小花产品4
小花产品5
小花产品6
  1. -- 查询其他字段,lateral view(侧视图, 虚拟表)
  2. select
  3. name, prod_arr_new
  4. from test.test_explode_map_array
  5. lateral view explode(prod_arr) tmp_table as prod_arr_new;

6.Hive三类UDF

  • UDF:用户自定义函数(user-defined function),输入一个值,返回一个值,一进一出

  • UDAF:用户自定义聚合函数(user-defined aggregate function),输入多个值,返回一个值,多进一出

  • UDTF:用户自定义表生成函数(user-defined table-generating function),输入一个值,返回多个值,一进多出

-- UDF:length\year\month\day ...
-- UDAF:sum\count\max ...
-- UDTF:explode

二、拆分 json

1.执行Linux命令

cd /data/import/

sudo vi test_explode_json.txt

  • 添加以下文件内容

a:shandong,b:beijing,c:hebei|1,2,3,4,5,6,7,8,9|[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]

2.创建表并加载数据

  1. -- 开启智能本地模式
  2. -- set hive.exec.mode.local.auto=true;
  3. -- 创建表
  4. create table test.test_explode_json(
  5. area string,
  6. goods_id string,
  7. sale_info string)
  8. row format delimited
  9. -- 字段分隔符为'|'
  10. fields terminated by '|'
  11. stored as textfile;
  12. -- 加载数据(方法一)
  13. load data local inpath '/data/import/test_explode_json.txt'
  14. overwrite into table test.test_explode_json;
  15. -- 加载数据(方法二)
  16. insert into test.test_explode_json
  17. values('a:shandong,b:beijing,c:hebei',
  18. '1,2,3,4,5,6,7,8,9',
  19. '[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]')

3.查询结果1

goods_id
1
2
3
4
5
6
7
8
9
  1. select
  2. explode(split(goods_id,',')) as goods_id
  3. from test.test_explode_json;

 4.查询结果2

codearea
ashandong
bbeijing
chebei
  1. select
  2. regexp_extract(area_new, '(.*):(.*)',1) as code
  3. , regexp_extract(area_new, '(.*):(.*)',2) as arec
  4. from test.test_explode_json
  5. lateral view explode(split(area,',')) tmp_table as area_new

 5.侧视图详解

lateral view + explode + split

结果集的行数是如何生成的?

  • 查询1个字段

  1. -- 9个元素= 9行
  2. select
  3. goods_id_new
  4. from test.test_explode_json
  5. lateral view explode(split(goods_id,',')) tmp_table as goods_id_new;
  • 查询2个字段

  1. -- 9个元素*1个元素 = 9行
  2. select
  3. goods_id_new,area
  4. from test.test_explode_json
  5. lateral view explode(split(goods_id,',')) tmp_table as goods_id_new;
  • 查询3个字段

  1. -- 9个元素*3个元素*1个元素=27行
  2. select
  3. goods_id_new,area_new,sale_info
  4. from test.test_explode_json
  5. lateral view explode(split(goods_id,',')) tmp_table as goods_id_new
  6. lateral view explode(split(area,',')) tmp_table as area_new;

6.查询结果3

sourcemonthsalesusercountscore
7fresh490019009.9
jd2090789819.8
jdmart698716009.0
  • 第一步:将字段sale_info的"[{"和"}]"替换为空字符串

  1. "[{"source":"7fresh","monthSales":4900,"userCount":1900,"score":"9.9"},{"source":"jd","monthSales":2090,"userCount":78981,"score":"9.8"},{"source":"jdmart","monthSales":6987,"userCount":1600,"score":"9.0"}]"
  2. select
  3. regexp_replace(sale_info, '\\[\\{|\\}\\]','')
  4. from test.test_explode_json
  • 第二步:以"},{"拆分为数组

  1. select
  2. explode(split(regexp_replace(sale_info, '\\[\\{|\\}\\]',''), '\\},\\{'))
  3. from test.test_explode_json
  • 第三步:将数组拆分为多行

  1. select
  2. sale_info_new
  3. from test.test_explode_json
  4. lateral view explode(split(regexp_replace(sale_info, '\\[\\{|\\}\\]',''), '\\},\\{')) tmp_table as sale_info_new
  • 第四步:转换为json字符串

  1. select
  2. concat('{', sale_info_new, '}')
  3. from test.test_explode_json
  4. lateral view explode(split(regexp_replace(sale_info, '\\[\\{|\\}\\]',''), '\\},\\{')) tmp_table as sale_info_new
  • 第五步:将json字符串转换为二维表

  1. select
  2. get_json_object(concat('{', sale_info_new, '}'), '$.source') as source
  3. , get_json_object(concat('{', sale_info_new, '}'), '$.monthSales') as monthSales
  4. , get_json_object(concat('{', sale_info_new, '}'), '$.userCount') as userCount
  5. , get_json_object(concat('{', sale_info_new, '}'), '$.score') as score
  6. from test.test_explode_json
  7. lateral view explode(split(regexp_replace(sale_info, '\\[\\{|\\}\\]',''), '\\},\\{')) tmp_table as sale_info_new

三、多行(列)合并为一行(列)

region_categorysubclass
东北-办公系固件,纸张,收纳具,信封,器具,美术,用品,装订机,标签
东北-家具用具,椅子,桌子,书架
东北-技术电话,配件,复印机,设备
  1. -- concat(str1|col1, str2|col2, …)
  2. -- 字符串合并,支持任意个字符串;
  3. -- concat_ws(sep, str1, str2, ...)
  4. -- 以sep为分隔符合并str1, str2, ...;如分隔符为null,则返回null;跳过合并值为null或空字符串的参数;
  5. -- collect_set(col)去重汇总
  6. -- 只支持基本数据类型(不支持复合数据类型),将某字段的项去重汇总,返回array数据类型;
  7. -- collect_list(col)不去重汇总
  8. select
  9. CONCAT(region, '-',category) as region_category
  10. , concat_ws(',', collect_set(subclass)) as `产品子类`
  11. from sm.sm_order_total
  12. group by CONCAT(region, '-',category)

 四、一行(列)拆分为多行(列)

  1. -- 通过以上查询语句来创建表
  2. create table test.test_explode_split as
  3. select
  4. CONCAT(region, '-',category) as region_category
  5. , concat_ws(',', collect_set(subclass)) as `产品子类`
  6. from sm.sm_order_total
  7. group by CONCAT(region, '-',category)
regioncategorysubclass_new
东北办公系固件
东北办公纸张
东北办公收纳具
东北办公信封
东北办公器具
东北办公美术
东北办公用品
东北办公装订机
东北办公标签
  1. select
  2. regexp_extract(t.region_category, '(.*)-(.*)', 1) as region
  3. , regexp_extract(t.region_category, '(.*)-(.*)', 2) as category
  4. , subclass_new
  5. from test.test_explode_split t
  6. lateral view explode(split(t.subclass, ',')) tmp_table as subclass_new;

五、reflect函数

  • 支持调用java函数

1.执行Linux命令

cd /data/import/
sudo vi test_reflect.txt

  • 添加以下文件内容

11    配饰    7    7
12    配饰    9    5
13    配饰    5    7
14    服饰    9    5
15    服饰    9    4
16    配饰    7    5
17    服饰    8    3
18    配饰    6    5
19    服饰    5    4
20    配饰    9    4

2.调用java的max函数求两列最大值

  1. use test;
  2. -- 创建表
  3. create table test.test_reflect(
  4. order_id int comment '订单编号',
  5. product string comment '产品',
  6. quality int comment '质量',
  7. service int comment '服务')
  8. row format delimited
  9. fields terminated by '\t';
  10. -- 加载数据
  11. load data local inpath '/data/import/test_reflect.txt'
  12. into table test.test_reflect;
  13. select
  14. *
  15. , reflect('java.lang.Math','max',quality, service) as max_score
  16. from test.test_reflect

3.不同的行执行不同的java函数

  • 配饰:求最大值

  • 服饰:求最小值

  1. select
  2. t.order_id
  3. , t.product
  4. , t.quality
  5. , t.service
  6. , reflect('java.lang.Math', method_name, t.quality, t.service) as score
  7. from
  8. (
  9. select
  10. order_id
  11. , product
  12. , quality
  13. , service
  14. , case product when '配饰' then 'max'
  15. when '服饰' then 'min' end as method_name
  16. from test.test_reflect
  17. ) as t
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/你好赵伟/article/detail/429949
推荐阅读
相关标签
  

闽ICP备14008679号