当前位置:   article > 正文

Hive/SparkSQL中Map、Array的基本使用和转换_spark sql map

spark sql map

一、Map

1. 构建

语法: map (key1, value1, key2, value2, …)
说明:根据输入的key和value对构建map类型

  1. --> 1. 一般创建方法
  2. select map('key1_name','张三','key2_age',20) as map_col
  3. -- 结果:
  4. {"key1_name":"张三","key2_age":"20"}
  5. --> 2. 根据SQL查询结果构建map
  6. select map('k_name',name,'k_age',age) as map_col
  7. from
  8. (
  9. select '张三' as name, 23 as age
  10. union select '李四' as name, 24 as age
  11. union select '王五' as name, 25 as age
  12. )
  13. -- 结果:
  14. {"k_age":"25","k_name":"王五"}
  15. {"k_age":"23","k_name":"张三"}
  16. {"k_age":"24","k_name":"李四"}

2. 读取

语法: M[key]
操作类型: M为map类型,key为map中的key值
说明:返回map类型M中,key值为指定值的value值。

  1. with base_tb as (
  2. select map('k_name',name,'k_age',age) as map_col
  3. from
  4. (
  5. select '张三' as name, 23 as age
  6. union select '李四' as name, 24 as age
  7. union select '王五' as name, 25 as age
  8. )
  9. )
  10. select map_col['k_name'] as col from base_tb
  11. -- 结果:
  12. 王五
  13. 张三
  14. 李四

3. 其他常用函数

  1. --> 1.size(Map)函数:可得map的长度。返回值类型:int
  2. select size(map_col) from base_tb;
  3. 结果:
  4. 2
  5. 2
  6. 2
  7. --> 2.map_keys(Map)函数:可得map中所有的key; 返回值类型: array
  8. select map_keys(map_col) from base_tb;
  9. 结果:
  10. ["k_name","k_age"]
  11. ["k_name","k_age"]
  12. ["k_name","k_age"]
  13. --> 3.map_values(Map)函数:可得map中所有的value; 返回值类型: array
  14. select map_values(map_col) from base_tb;
  15. 结果:
  16. ["王五","25"]
  17. ["李四","24"]
  18. ["张三","23"]
  19. --> 4.判断map中是否包含某个key值:这种情况较为常见,多用于一些复杂逻辑判断中;
  20. select array_contains(map_keys(map_col),'k_name') from base_tb;
  21. 结果:
  22. true
  23. true
  24. true

二、Array

1.构建

语法: array(val1, val2, …)
说明:根据输入的参数构建数组array类型

其他:

collect_set()/collect_list()

在 Hive 中想实现按某字段分组,对另外字段进行合并,可通过collect_list()或者collect_set()实现。

collect_set()与collect_list()的区别:
    collect_list()函数 - - 不去重
    collect_set()函数 - - 去重复

  1. --> 1. 一般创建方法
  2. select array('张三','李四','王五') as names1, -- 直接创建
  3. split('张三2,李四2,王五2',',') as names2 -- 通过函数创建
  4. -- 结果:
  5. names1 names2
  6. ["张三","李四","王五"] ["张三2","李四2","王五2"]
  7. --> 2. 根据SQL查询结果构建array
  8. select id,
  9. collect_list(name) as name_list,
  10. collect_set(name) as name_set,
  11. collect_set(age) as age_set
  12. from
  13. (
  14. select 1 as id, '张三' as name, 23 as age
  15. union select 1 as id, '李四' as name, 24 as age
  16. union select 1 as id, '王五' as name, 25 as age
  17. union select 1 as id, '王五' as name, 26 as age
  18. )
  19. group by
  20. id
  21. -- 结果:
  22. id name_list name_set age_set
  23. 1 ["李四","张三","王五","王五"] ["李四","张三","王五"] [24,23,26,25]

2.读取

语法: A[n]
操作类型: A为array类型,n为int类型
说明:返回数组A中的第n个变量值。数组的起始下标为0,越界返回NULL。

  1. with base_tb as (
  2. select id,
  3. collect_list(name) as name_list,
  4. collect_set(name) as name_set,
  5. collect_set(age) as age_set
  6. from
  7. (
  8. select 1 as id, '张三' as name, 23 as age
  9. union select 1 as id, '李四' as name, 24 as age
  10. union select 1 as id, '王五' as name, 25 as age
  11. union select 1 as id, '王五' as name, 26 as age
  12. )
  13. group by
  14. id
  15. )
  16. select name_list,name_list[2],name_list[3] from base_tb
  17. -- 结果:
  18. name_list _col0 _col1
  19. ["张三","王五","王五","李四"] 王五 李四

3.其他常用函数

  1. --> 1.array_max(array):返回数组中的最大值。
  2. select array_max(array(1,3,5)) as c1,
  3. array_max(array('a','b','c')) as c2
  4. +-----+-----+--+
  5. | c1 | c2 |
  6. +-----+-----+--+
  7. | 5 | c |
  8. +-----+-----+--+
  9. --> 2.array_min(array):返回数组中的最小值。
  10. select array_min(array(1,3,5)) as c1,
  11. array_min(array('a','b','c')) as c2
  12. +-----+-----+--+
  13. | c1 | c2 |
  14. +-----+-----+--+
  15. | 1 | a |
  16. +-----+-----+--+
  17. --> 3.array_join(array, delimiter):使用指定的分隔符将数组中的元素连接成一个字符串。与concat_ws函数功能相同,##注意参数顺序##。
  18. select array_join(array(1,3,5),'_') as c1,
  19. concat_ws('#',array('a','b','c')) as c2
  20. +--------+--------+--+
  21. | c1 | c2 |
  22. +--------+--------+--+
  23. | 1_3_5 | a#b#c |
  24. +--------+--------+--+
  25. --> 4.array_sort(array):返回一个按升序排序的数组。
  26. select array_sort(array(1,5,3)) as c1,
  27. array_sort(array('c','b','a')) as c2
  28. +----------+----------------+--+
  29. | c1 | c2 |
  30. +----------+----------------+--+
  31. | [1,3,5] | ["a","b","c"] |
  32. +----------+----------------+--+
  33. --> 5.array_contains(array, value):判断数组中是否包含指定的值。包含返回true,否则返回false。
  34. select array_contains(array(1,2,3), 2) as c1,
  35. array_contains(array('a','b','c'), 'd') as c2
  36. +-------+--------+--+
  37. | c1 | c2 |
  38. +-------+--------+--+
  39. | true | false |
  40. +-------+--------+--+
  41. --> 6.explode(array):炸裂数组(列转行);
  42. select keyid,
  43. cc
  44. from
  45. (
  46. select 1 as keyid,
  47. array('dbc','aef','ghij','aef','ghij','aef','ghij','aef','ghij','aef','ghij') as a
  48. ) t0
  49. lateral view explode(a) e as cc
  50. +-------+--------+--+
  51. keyid cc
  52. 1 dbc
  53. 1 aef
  54. 1 ghij
  55. 1 aef
  56. 1 ghij
  57. 1 aef
  58. 1 ghij
  59. 1 aef
  60. 1 ghij
  61. 1 aef
  62. 1 ghij
  63. --> 7.array_except(array1, array2):在数组array1中去除掉所有数组array2中的元素(数组array1比数组array2多的元素)。
  64. select array_except(array(1,2,3,2), array(2,3)) as c1,
  65. array_except(array('a','b','c'),array('d')) as c2
  66. +------+----------------+--+
  67. | c1 | c2 |
  68. +------+----------------+--+
  69. | [1] | ["a","b","c"] |
  70. +------+----------------+--+
  71. --> 8.array_intersect(array1, array2):返回一个包含所有同时在数组array1和数组array2中的元素的数组(数组array1和数组array2的交集元素)。
  72. select array_intersect(array(1,2,3,2), array(2,3)) as c1,
  73. array_intersect(array('a','b','c'),array('d')) as c2
  74. +--------+-----+--+
  75. | c1 | c2 |
  76. +--------+-----+--+
  77. | [2,3] | [] |
  78. +--------+-----+--+

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/我家自动化/article/detail/794883
推荐阅读
相关标签
  

闽ICP备14008679号