赞
踩
本部分主要讲述了列转行函数,是最最最最常用的函数,爆炸函数
explode 函数是UFTF函数,将hive一列中复杂的array或者map结构拆分成多行。Explode函数是不允许在select再有其他字段,
explode(ARRAY) 列表中的每个元素生成一行
explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列
限制:
1、No other expressions are allowed in SELECT
SELECT pageid, explode(adid_list) AS myCol... is not supported
2、UDTF's can't be nested
SELECT explode(explode(adid_list)) AS myCol... is not supported
3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported
SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
举例1:拆分array
Then running the query: SELECT explode(myCol) AS myNewCol FROM myTable;
will produce:
举例2:SELECT explode(myMap) AS (myMapKey, myMapValue) FROM myMapTable;
可使用lateral view解除以上限制,只要是想查询除exlpode之外的字段就要使用侧写函数lateral view;
用法:LATERAL VIEW udtf(expression) tableAlias(临时表名) AS columnAlias(临时表的字段)
解释:用于和split, explode等UDTF一起使用,它能够将一列数据拆成多行数据,在此基础上可以对拆分后的数据进行聚合。
Consider the following base table named pageAds. It has two columns: pageid (name of the page) and adid_list (an array of ads appearing on the page)
An example table with two rows:
and the user would like to count the total number of times an ad appears across all pages.
A lateral view with explode() can be used to convert adid_list into separate rows using the query;
SELECT pageid, adid
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;
The resulting output will be
Then in order to count the number of times a particular ad appears, count/group by can be used:
SELECT adid, count(1)
FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid
GROUP BY adid;
The resulting output will be
由此可见,lateral view与explode等udtf就是天生好搭档,explode将复杂结构一行拆成多行,然后再用lateral view做各种聚合。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。