当前位置:   article > 正文

hive 字段中逗号怎么处理,在Hive中以逗号分隔值的列

hive 处理以逗号分隔的字段

It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:

+------+------+

| Col1 | Col2 |

+------+------+

| a | 1 |

| a | 5 |

| a | 6 |

| b | 2 |

| b | 6 |

+------+------+

to this:

+------+-------+

| Col1 | Col2 |

+------+-------+

| a | 1,5,6 |

| b | 2,6 |

+------+-------+

解决方案

The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:

SELECT Col1, collect_set(Col2)

FROM your_table

GROUP BY Col1;

However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.

本文内容由网友自发贡献,转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/590068
推荐阅读
相关标签
  

闽ICP备14008679号