当前位置:   article > 正文

2021-09-30 CDH spark submit job:To solve this try to set hive.exec.max.dynamic.partitions_number of dynamic partitions created is 1083, whic

number of dynamic partitions created is 1083, which is more than 1000. to so

[2021-09-30 08:22:01,451] {ssh.py:141} INFO - 21/09/30 16:22:01 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.

查看异常信息,知道是动态分区数设置不合理
于是在spark submit中添加
–conf “hive.exec.max.dynamic.partitions=2048”
–conf “hive.exec.max.dynamic.partitions.pernode=512”
重新提交job,结果提示

[2021-09-30 08:45:36,601] {ssh.py:141} INFO - Warning: Ignoring non-spark config property: hive.exec.max.dynamic.partitions=2048
[2021-09-30 08:45:36,601] {ssh.py:141} INFO -
[2021-09-30 08:45:36,602] {ssh.py:141} INFO - Warning: Ignoring non-spark config property: hive.exec.max.dynamic.partitions.pernode=512

于是找找资料:发现在CDH上修改这项参数的正确格式是
–conf “spark.hadoop.hive.exec.max.dynamic.partitions=2048”
–conf “spark.hadoop.hive.exec.max.dynamic.partitions.pernode=512” \

前面要加上:spark.hadoop.

参考资料:https://community.cloudera.com/t5/Support-Questions/Unable-to-set-hive-exec-max-dynamic-partitions-while/td-p/294658

再次提交任务,看是否会出现同样的错误。

其他平台应该只能通过修改hive-site.xml文件,查资料时有看到stack- overflow上的帖子称,spark 2.X之后在Spark CLI 中添加配置单元集属性可能不起作用

<name>hive.exec.max.dynamic.partitions</name>
<value>2048</value>
<description></description>
  • 1
  • 2
  • 3

然后重启hive server2、hive history进程

参考资料:https://stackoverflow.com/questions/40506484/cannot-change-hive-exec-max-dynamic-partitions-in-spark

spark社区提的一个issues和这个问题很一致
https://issues.apache.org/jira/browse/SPARK-19881

完整日志如下:

[2021-09-30 08:21:59,319] {ssh.py:141} INFO - 21/09/30 16:21:59 INFO yarn.Client: Application report for application_1632616543267_0198 (state: RUNNING)
[2021-09-30 08:22:00,322] {ssh.py:141} INFO - 21/09/30 16:22:00 INFO yarn.Client: Application report for application_1632616543267_0198 (state: RUNNING)
[2021-09-30 08:22:01,333] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO yarn.Client: Application report for application_1632616543267_0198 (state: FINISHED)
21/09/30 16:22:01 INFO yarn.Client: 
	 client token: N/A
	 diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
	at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:934)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
	at myspark.warehouse.DriveEvent$.main(DriveEvent.scala:139)
	at myspark.warehouse.DriveEvent.main(DriveEvent.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.
	at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:1885)
	at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1918)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
	at org.apache.spar
[2021-09-30 08:22:01,341] {ssh.py:141} INFO - k.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:946)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
	... 24 more

	 ApplicationMaster host: bd.vn0038.jmrh.com
	 ApplicationMaster RPC port: 33815
	 queue: root.users.hdfs
	 start time: 1632967076249
	 final status: FAILED
	 tracking URL: http://bd.vn0038.jmrh.com:8088/proxy/application_1632616543267_0198/
	 user: hdfs
[2021-09-30 08:22:01,451] {ssh.py:141} INFO - 21/09/30 16:22:01 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.;
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
	
[2021-09-30 08:22:01,456] {ssh.py:141} INFO - at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:934)
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205)
	at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
	at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
[2021-09-30 08:22:01,459] {ssh.py:141} INFO - 	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
[2021-09-30 08:22:01,460] {ssh.py:141} INFO - 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
	at myspark.warehouse.DriveEvent$.main(DriveEvent.scala:139)
	
[2021-09-30 08:22:01,461] {ssh.py:141} INFO - at myspark.warehouse.DriveEvent.main(DriveEvent.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	
[2021-09-30 08:22:01,464] {ssh.py:141} INFO - at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.
	at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:1885)
	
[2021-09-30 08:22:01,468] {ssh.py:141} INFO - at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1918)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169)
[2021-09-30 08:22:01,470] {ssh.py:141} INFO - 
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
[2021-09-30 08:22:01,473] {ssh.py:141} INFO - 
	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
	at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:946)
	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934)
[2021-09-30 08:22:01,474] {ssh.py:141} INFO - 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934)
	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
	... 24 more

[2021-09-30 08:22:01,477] {ssh.py:141} INFO - Exception in thread "main" 
[2021-09-30 08:22:01,480] {ssh.py:141} INFO - org.apache.spark.SparkException: Application application_1632616543267_0198 finished with failed status
[2021-09-30 08:22:01,481] {ssh.py:141} INFO - 
[2021-09-30 08:22:01,484] {ssh.py:141} INFO - 	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158)
[2021-09-30 08:22:01,487] {ssh.py:141} INFO - 	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1606)
[2021-09-30 08:22:01,488] {ssh.py:141} INFO - 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
[2021-09-30 08:22:01,491] {ssh.py:141} INFO - 
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
[2021-09-30 08:22:01,493] {ssh.py:141} INFO - 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[2021-09-30 08:22:01,495] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Shutdown hook called
[2021-09-30 08:22:01,497] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ab0bc1c-ec16-4079-bbf3-51623bb7ef3f
[2021-09-30 08:22:01,526] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ec2c88dd-9781-46fb-8874-ac5ecddc6e2b
[2021-09-30 08:22:02,445] {taskinstance.py:1462} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 168, in execute
    raise AirflowException(f"error running cmd: {self.command}, error: {error_msg}")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 171, in execute
    raise AirflowException(f"SSH operator error: {str(e)}")
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/笔触狂放9/article/detail/625508
推荐阅读
相关标签
  

闽ICP备14008679号