赞
踩
[2021-09-30 08:22:01,451] {ssh.py:141} INFO - 21/09/30 16:22:01 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.
查看异常信息,知道是动态分区数设置不合理
于是在spark submit中添加
–conf “hive.exec.max.dynamic.partitions=2048”
–conf “hive.exec.max.dynamic.partitions.pernode=512”
重新提交job,结果提示
[2021-09-30 08:45:36,601] {ssh.py:141} INFO - Warning: Ignoring non-spark config property: hive.exec.max.dynamic.partitions=2048
[2021-09-30 08:45:36,601] {ssh.py:141} INFO -
[2021-09-30 08:45:36,602] {ssh.py:141} INFO - Warning: Ignoring non-spark config property: hive.exec.max.dynamic.partitions.pernode=512
于是找找资料:发现在CDH上修改这项参数的正确格式是
–conf “spark.hadoop.hive.exec.max.dynamic.partitions=2048”
–conf “spark.hadoop.hive.exec.max.dynamic.partitions.pernode=512” \
前面要加上:spark.hadoop.
再次提交任务,看是否会出现同样的错误。
其他平台应该只能通过修改hive-site.xml文件,查资料时有看到stack- overflow上的帖子称,spark 2.X之后在Spark CLI 中添加配置单元集属性可能不起作用
<name>hive.exec.max.dynamic.partitions</name>
<value>2048</value>
<description></description>
然后重启hive server2、hive history进程
参考资料:https://stackoverflow.com/questions/40506484/cannot-change-hive-exec-max-dynamic-partitions-in-spark
spark社区提的一个issues和这个问题很一致
https://issues.apache.org/jira/browse/SPARK-19881
完整日志如下:
[2021-09-30 08:21:59,319] {ssh.py:141} INFO - 21/09/30 16:21:59 INFO yarn.Client: Application report for application_1632616543267_0198 (state: RUNNING) [2021-09-30 08:22:00,322] {ssh.py:141} INFO - 21/09/30 16:22:00 INFO yarn.Client: Application report for application_1632616543267_0198 (state: RUNNING) [2021-09-30 08:22:01,333] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO yarn.Client: Application report for application_1632616543267_0198 (state: FINISHED) 21/09/30 16:22:01 INFO yarn.Client: client token: N/A diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108) at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:934) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651) at myspark.warehouse.DriveEvent$.main(DriveEvent.scala:139) at myspark.warehouse.DriveEvent.main(DriveEvent.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036. at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:1885) at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1918) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266) at org.apache.spar [2021-09-30 08:22:01,341] {ssh.py:141} INFO - k.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:946) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99) ... 24 more ApplicationMaster host: bd.vn0038.jmrh.com ApplicationMaster RPC port: 33815 queue: root.users.hdfs start time: 1632967076249 final status: FAILED tracking URL: http://bd.vn0038.jmrh.com:8088/proxy/application_1632616543267_0198/ user: hdfs [2021-09-30 08:22:01,451] {ssh.py:141} INFO - 21/09/30 16:22:01 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036.; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108) [2021-09-30 08:22:01,456] {ssh.py:141} INFO - at org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:934) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205) at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104) at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102) at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115) [2021-09-30 08:22:01,459] {ssh.py:141} INFO - at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194) at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364) at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363) [2021-09-30 08:22:01,460] {ssh.py:141} INFO - at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651) at myspark.warehouse.DriveEvent$.main(DriveEvent.scala:139) [2021-09-30 08:22:01,461] {ssh.py:141} INFO - at myspark.warehouse.DriveEvent.main(DriveEvent.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [2021-09-30 08:22:01,464] {ssh.py:141} INFO - at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:673) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 1036, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 1036. at org.apache.hadoop.hive.ql.metadata.Hive.getValidPartitionsInPath(Hive.java:1885) [2021-09-30 08:22:01,468] {ssh.py:141} INFO - at org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:1918) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169) [2021-09-30 08:22:01,470] {ssh.py:141} INFO - at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221) [2021-09-30 08:22:01,473] {ssh.py:141} INFO - at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266) at org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:946) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934) [2021-09-30 08:22:01,474] {ssh.py:141} INFO - at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:934) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99) ... 24 more [2021-09-30 08:22:01,477] {ssh.py:141} INFO - Exception in thread "main" [2021-09-30 08:22:01,480] {ssh.py:141} INFO - org.apache.spark.SparkException: Application application_1632616543267_0198 finished with failed status [2021-09-30 08:22:01,481] {ssh.py:141} INFO - [2021-09-30 08:22:01,484] {ssh.py:141} INFO - at org.apache.spark.deploy.yarn.Client.run(Client.scala:1158) [2021-09-30 08:22:01,487] {ssh.py:141} INFO - at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1606) [2021-09-30 08:22:01,488] {ssh.py:141} INFO - at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195) [2021-09-30 08:22:01,491] {ssh.py:141} INFO - at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926) [2021-09-30 08:22:01,493] {ssh.py:141} INFO - at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [2021-09-30 08:22:01,495] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Shutdown hook called [2021-09-30 08:22:01,497] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-2ab0bc1c-ec16-4079-bbf3-51623bb7ef3f [2021-09-30 08:22:01,526] {ssh.py:141} INFO - 21/09/30 16:22:01 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-ec2c88dd-9781-46fb-8874-ac5ecddc6e2b [2021-09-30 08:22:02,445] {taskinstance.py:1462} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 168, in execute raise AirflowException(f"error running cmd: {self.command}, error: {error_msg}") During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/ssh/operators/ssh.py", line 171, in execute raise AirflowException(f"SSH operator error: {str(e)}")
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。