当前位置:   article > 正文

Hive中的explode使用全解(附图片详解)_hive explode

hive explode

 

数据表:

表名:default.class_info,分别是班级,姓名,成绩 3列

单列Explode:

需求:将student这一列中数据由一行变为多行(使用split和explode,并结合lateral view函数实现)

  1. select
  2. class,student_name
  3. from
  4. default.class_info
  5. lateral view explode(split(student,',')) t as student_name

结果如下:

 单列Posexplode

需求:想要给每班的每个同学按照顺序来一个编号(使用posexplode函数)

  1. select
  2. class,student_index + 1 as student_index,student_name
  3. from
  4. default.class_info
  5. lateral view posexplode(split(student,',')) t as student_index,student_name;

注意:student_index+1 是因为index是从0开始的

结果如下:

多列Explode

需求:基于学生姓名和分数使其两两匹配,期望得到如下效果。

尝试: 先对两列进行explode

  1. select
  2. class,student_name,student_score
  3. from
  4. default.class_info
  5. lateral view explode(split(student,',')) sn as student_name
  6. lateral view explode(split(score,',')) sc as student_score

结果如下:

不太符合预期,如果对两列都进行explode的话,假设每列都有3个值,最终会变为3*3=9行,但我们实际只想要3行

解决办法:

我们进行两次posexplode,姓名和成绩都保留对应的序号,即便是变成了9行,通过where筛选只保留行号相同的index即可。

  1. select
  2. class,student_name,student_score
  3. from
  4. default.class_info
  5. lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
  6. lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
  7. where
  8. student_index_sn = student_index_sc;

结果如下:

需求:假设我们又想对同学的成绩进行一下排名(借助rank( ) 函数 )

  1. select
  2. class,
  3. student_name,
  4. student_score,
  5. rank() over(partition by class order by student_score desc) as student_rank
  6. from
  7. default.class_info
  8. lateral view posexplode(split(student,',')) sn as student_index_sn,student_name
  9. lateral view posexplode(split(score,',')) sc as student_index_sc,student_score
  10. where
  11. student_index_sn = student_index_sc
  12. order by class,student_rank;

结果如下:

 

补充:

若没有spilt函数,可能会用到array()

  1. lateral view
  2. posexplode (array()) as as f_keys,f_values
user_id bus_bike_taxi_train_
1001503897082054
100224171008500
10038050500400
user_id pay_typeAmount
10011503
1001289
10013708
100142054
  1. select
  2. user_id,
  3. index_+1 as pay_type,
  4. value as amount
  5. from pay_record_log
  6. lateral view
  7. posexplode(ARRAY(bus_,bike_,taxi_,train_))t as index_,value

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/426593
推荐阅读
相关标签
  

闽ICP备14008679号