当前位置:   article > 正文

LATERAL VIEW 使用总结

lateral view

The LATERAL VIEW clause is used in conjunction with generator functions such as EXPLODE, which will generate a virtual table containing one or more rows. LATERAL VIEW will apply the rows to each original output row.

LATERAL VIEW Clause - Spark 3.2.0 Documentation (apache.org)

使用案例一(单个LATERAL VIEW):split + explode + LATERAL VIEW

求出每个技能对应的最大的用户的年龄

表和数据

user_iduser_nameageskills
1356kyle23Hadoop-Hive-Spark
1357Jack22Hadoop-Hive
1358Sam26Mysql-Oracle
1359Lucy28Redis-Mysql
1360Rose32Hadoop-Hive-Spark-Flink-Hbase
1361Herry25Flink-Hbase-ClickHouse-Kafka
1362Kelly27Spark-Flink-Hbase
cache table user_info
select '1356' user_id, 'kyle' user_name, 23 age, 'Hadoop-Hive-Spark'  skills
union
select '1357' user_id, 'Jack' user_name, 22 age, 'Hadoop-Hive' skills
union
select '1358' user_id, 'Sam' user_name, 26 age, 'Mysql-Oracle'  skills
union
select '1359' user_id, 'Luc' user_name, 28 age, 'Redis-Mysql' skills
union
select '1360' user_id, 'Rose' user_name, 32 age, 'Hadoop-Hive-Spark-Flink-Hbase' skills
union
select '1361' user_id, 'Harry' user_name, 25 age, 'Flink-Hbase-ClickHouse-Kafka'  skills
union
select '1362' user_id, 'Kelly' user_name, 27 age, 'Spark-Flink-Hbase' skills;

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

需求分析

先从 skills 字段把每个技能分割出来,然后按照 user_idskills 字段分组,求出最大的年龄

with t1 as (
    -- 对 skills 字段进行切割并实现列转行
    select user_id,
           user_name,
           age,
           skill
    from user_info
    lateral view explode(split(skills,'-')) skill_table as skill
),
     t2 as (
     -- 按照 skill 分组 age 排序,为了标记每个技能对应的最大的用户信息
     select *,
            row_number() over(partition by skill order by age desc) rn
     from t1
)

select
       user_id,
       user_name,
       age,
       skill
from t2
where rn = 1;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23

在这里插入图片描述

使用案例二(多个LATERAL VIEW):explode + LATERAL VIEW

skillsmark 字段全部转为列

表和数据

user_iduser_nameageskillsmark
1356kyle23[“Hadoop”,“Hive”,“Spark”][“A”, “B”, “C”]
1357Jack22[“Hadoop”,“Hive”][“A”, “D”, “E”]
1358Sam26[“Mysql”,“Oracle”][“B”, “C”]
1359Lucy28[“Redis”,“Mysql”][“D”, “E”]

需求分析

由于 skillsmark 字段全部都是 Array<String> 类型,所以可以直接使用 explode 函数处理

select 
    user_id,
    user_name,
    age,
    skill,
    mark
FROM baseTable
LATERAL VIEW explode(skills) view1 AS skill
LATERAL VIEW explode(mark) view2 AS mark;
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号