当前位置:   article > 正文

hive根据现有数据表创建分区表,进行动态或静态分区插入数据_hive一个表可以插入静态分区数据和动态分区插入

hive一个表可以插入静态分区数据和动态分区插入

1:现有数据表结构定义:

  1. CREATE TABLE `tab_user`(
  2. `name` string,
  3. `age` int,
  4. `sex` string,
  5. `addr` string)
  6. ROW FORMAT DELIMITED
  7. FIELDS TERMINATED BY ','
  8. LINES TERMINATED BY '\n'
  9. STORED AS INPUTFORMAT
  10. 'org.apache.hadoop.mapred.TextInputFormat'
  11. OUTPUTFORMAT
  12. 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
  13. LOCATION
  14. 'hdfs://node:9000/user/hive/warehouse/daxin.db/tab_user'

2:现有数据表的数据:

  1. daxin 18 male beijing
  2. mali 28 female shandong
  3. wangsan 34 male beijing
  4. lisi 45 male liaoning
  5. liwu 58 female beijing
  6. maoliu 43 male anhui
  7. zhouba 62 female beijing

3:对现有用户表数据按照位置信息进行分区,创建新的用户分区表:

  1. CREATE TABLE `user_partition_tab`(
  2. `name` string
  3. `age` int
  4. `sex` string)  PARTITIONED BY(addr STRING);

4:插入用户数据:

insert overwrite table ptab PARTITION (addr)  select name,age,sex,addr from user_partition_tab;

执行上面代码会报错:

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

拓展:

对于hive分区表插入数据时候,对于分区字段可以自行指定一个静态字段或者根据分区字段的具体值进行插入分区表,对于前者指定一个分区值的插入则成为静态分区插入,而后者根据分区字段的具体值插入则成为动态分区插入。

a:静态插入:

insert overwrite table ptab PARTITION (addr='qiqihaer')  select name,age,sex from tab_user;

指定分区字段addr的值为qiqihaer,如果表中该分区不存在的话则创建该分区。

b:动态插入:

insert overwrite table ptab PARTITION (addr)  select name,age,sex,addr from tab_user;

该分区字段是根据select出来的具体值进行动态分区,因此就需要开启:set hive.exec.dynamic.partition.mode=nonstrict。

Configuration property

Default

Note

hive.exec.dynamic.partition

true

Needs to be set to true to enable dynamic partition inserts

hive.exec.dynamic.partition.mode

strict

In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic

hive.exec.max.dynamic.partitions.pernode

100

Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

hive.exec.max.dynamic.partitions

1000

Maximum number of dynamic partitions allowed to be created in total

hive.exec.max.created.files

100000

Maximum number of HDFS files created by all mappers/reducers in a MapReduce job

hive.error.on.empty.partition

false

Whether to throw an exception if dynamic partition insert generates empty results

hive.exec.dynamic.partition.mode默认是strict,必须制定一个分区进行插入数据,以避免覆盖所有的分区数据;但是如果需要动态分区插入数据就必须设置nonstrict,nonstrict表示不是严格的必须指定一个静态分区,言外之意就是动态分区插入。其他属性容易理解不解释。

参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-DynamicPartitionInserts

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/772529
推荐阅读
相关标签
  

闽ICP备14008679号