当前位置:   article > 正文

Spark 调用 hive使用动态分区插入数据_sparksql动态插入多个分区

sparksql动态插入多个分区

spark 调用sql插入hive 失败 ,执行语句如下

spark.sql("INSERT INTO default.test_table_partition partition(province,city) SELECT xxx,xxx md5(province),md5(city)  FROM test_table")

报错如下,需动态插入分区

  1. Exception in thread "main" org.apache.spark.SparkException: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
  2. at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:314)
  3. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:66)
  4. at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:61)
  5. at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:77)
  6. at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:183)
  7. at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:183)
  8. at org.apache.spark.sql.Dataset$$anonfun$54.apply(Dataset.scala:2841)
  9. at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
  10. at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2840)
  11. at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183)
  12. at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68)
  13. at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632)
  14. at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  15. at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  16. at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  17. at java.lang.reflect.Method.invoke(Method.java:498)
  18. at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
  19. at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
  20. at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
  21. at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
  22. at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

 在spark配置中加入:

.config("hive.exec.dynamici.partition",true)
.config("hive.exec.dynamic.partition.mode","nonstrict")

  1. val spark = SparkSession
  2. .builder()
  3. // .master("local[2]")
  4. .appName("WeiBoAccount-Verified")
  5. .config("spark.serializer","org.apache.spark.serializer.KryoSerializer")
  6. .config("hive.exec.dynamici.partition",true)
  7. .config("hive.exec.dynamic.partition.mode","nonstrict")
  8. .enableHiveSupport()
  9. .getOrCreate()

 

相关参数说明:

  1. Hive.exec.dynamic.partition  是否启动动态分区。false(不开启) true(开启)默认是 false
  2. hive.exec.dynamic.partition.mode  打开动态分区后,动态分区的模式,有 strict和 nonstrict 两个值可选,strict 要求至少包含一个静态分区列,nonstrict则无此要求。各自的好处,大家自己查看哈。
  3. hive.exec.max.dynamic.partitions 允许的最大的动态分区的个数。可以手动增加分区。默认1000

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/474572
推荐阅读
相关标签
  

闽ICP备14008679号