赞
踩
来自官方的经典的 spark 架构图如下:
上述架构图,从进程的角度来讲,有四个角色/组件:
ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 256f3dc9-c1a3-49f3-be2c-9ab81a8dd518_1: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error "Error: Could not create the Java Virtual Machine.","Error: A fatal exception has occurred. Program will exit."
INFO : Completed executing command(queryId=hive_20240620154413_1ad26fe2-f2d5-4252-a609-b3b8d4ce2822); Time taken: 0.517 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 256f3dc9-c1a3-49f3-be2c-9ab81a8dd518_1: java.lang.RuntimeException: spark-submit process failed with exit code 1 and error "Error: Could not create the Java Virtual Machine.","Error: A fatal exception has occurred. Program will exit." (state=42000,code=30041)
# hive on spark 相关配置-/run/cloudera-scm-agent/process/5666-hive-HIVESERVER2/spark-defaults.conf 包含: spark.master=yarn spark.submit.deployMode=cluster spark.authenticate=true spark.driver.log.dfsDir=/user/spark/driverLogs spark.driver.log.persistToDfs.enabled=true spark.dynamicAllocation.enabled=true spark.dynamicAllocation.executorIdleTimeout=60 spark.dynamicAllocation.minExecutors=1 spark.dynamicAllocation.schedulerBacklogTimeout=1 spark.eventLog.enabled=true spark.io.encryption.enabled=false spark.network.crypto.enabled=false spark.serializer=org.apache.spark.serializer.KryoSerializer spark.shuffle.service.enabled=true spark.shuffle.service.port=7337 spark.ui.enabled=true spark.ui.killEnabled=true spark.lineage.log.dir=/var/log/spark/lineage spark.lineage.enabled=true spark.eventLog.dir=hdfs://ns1/user/spark/applicationHistory spark.yarn.historyServer.address=http://uf30-3:18088 spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/jars/*,local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/hive/* spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native spark.yarn.config.gatewayPath=/opt/cloudera/parcels spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/../../.. spark.yarn.historyServer.allowTracking=true spark.yarn.appMasterEnv.MKL_NUM_THREADS=1 spark.executorEnv.MKL_NUM_THREADS=1 spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1 spark.executorEnv.OPENBLAS_NUM_THREADS=1 spark.extraListeners=com.cloudera.spark.lineage.NavigatorAppListener spark.sql.queryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListener # hs2中,hive on spark 相关日志: 2024-06-20 09:43:30,902 INFO org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-151785]: Loading spark defaults configs from: file:/run/cloudera-scm-agent/process/5666-hive-HIVESERVER2/spark-defaults.conf 2024-06-20 09:43:30,908 INFO org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-151785]: Running client driver with argv: /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/bin/spark-submit --executor-cores 4 --executor-memory 2147483648b --principal hive/uf30-1@CDH.COM --keytab hive.keytab --jars /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar --properties-file /tmp/spark-submit.6442647368541171349.properties --class org.apache.hive.spark.client.RemoteDriver /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar --remote-host uf30-1 --remote-port 54208 --remote-driver-conf hive.spark.client.future.timeout=60000 --remote-driver-conf hive.spark.client.connect.timeout=1000 --remote-driver-conf hive.spark.client.server.connect.timeout=900000 --remote-driver-conf hive.spark.client.channel.log.level=null --remote-driver-conf hive.spark.client.rpc.max.size=52428800 --remote-driver-conf hive.spark.client.rpc.threads=8 --remote-driver-conf hive.spark.client.secret.bits=256 --remote-driver-conf hive.spark.client.rpc.server.address=null --remote-driver-conf hive.spark.client.rpc.server.port=null 2024-06-20 09:43:31,887 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=900000 2024-06-20 09:43:31,887 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8 2024-06-20 09:43:31,887 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.future.timeout=60000 2024-06-20 09:43:31,887 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000 2024-06-20 09:43:31,887 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256 2024-06-20 09:43:31,888 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800 2024-06-20 09:43:32,059 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:32 WARN spark.SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead. 2024-06-20 09:43:32,059 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:32 WARN spark.SparkConf: The configuration key 'spark.yarn.driver.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.driver.memoryOverhead' instead. 2024-06-20 09:43:33,044 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Kerberos credentials: principal = hive/uf30-1@CDH.COM, keytab = hive.keytab 2024-06-20 09:43:33,531 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm71 2024-06-20 09:43:33,585 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers 2024-06-20 09:43:33,698 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO conf.Configuration: resource-types.xml not found 2024-06-20 09:43:33,698 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'. 2024-06-20 09:43:33,719 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (20480 MB per container) 2024-06-20 09:43:33,720 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Will allocate AM container, with 2560 MB memory including 512 MB overhead 2024-06-20 09:43:33,720 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Setting up container launch context for our AM 2024-06-20 09:43:33,724 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Setting up the launch environment for our AM container 2024-06-20 09:43:33,745 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Preparing resources for our AM container 2024-06-20 09:43:33,805 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN Secure Distributed Cache. 2024-06-20 09:43:33,810 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:33 INFO yarn.Client: Uploading resource file:/run/cloudera-scm-agent/process/5666-hive-HIVESERVER2/hive.keytab -> hdfs://ns1/user/hive/.sparkStaging/application_1716544959017_1620/hive.keytab 2024-06-20 09:43:34,091 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-exec-2.1.1-cdh6.3.2.jar -> hdfs://ns1/user/hive/.sparkStaging/application_1716544959017_1620/hive-exec-2.1.1-cdh6.3.2.jar 2024-06-20 09:43:34,434 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO yarn.Client: Uploading resource file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/jars/hive-kryo-registrator-2.1.1-cdh6.3.2.jar -> hdfs://ns1/user/hive/.sparkStaging/application_1716544959017_1620/hive-kryo-registrator-2.1.1-cdh6.3.2.jar 2024-06-20 09:43:34,771 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO yarn.Client: Uploading resource file:/tmp/spark-6592d710-76fe-4804-9e14-6fa37e26747c/__spark_conf__4167604981797776583.zip -> hdfs://ns1/user/hive/.sparkStaging/application_1716544959017_1620/__spark_conf__.zip 2024-06-20 09:43:34,852 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO spark.SecurityManager: Changing view acls to: hive 2024-06-20 09:43:34,853 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO spark.SecurityManager: Changing modify acls to: hive 2024-06-20 09:43:34,854 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO spark.SecurityManager: Changing view acls groups to: 2024-06-20 09:43:34,855 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO spark.SecurityManager: Changing modify acls groups to: 2024-06-20 09:43:34,856 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO spark.SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hive); groups with view permissions: Set(); users with modify permissions: Set(hive); groups with modify permissions: Set() 2024-06-20 09:43:34,893 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:34 INFO conf.HiveConf: Found configuration file file:/etc/hive/conf.cloudera.hive/hive-site.xml 2024-06-20 09:43:35,013 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO security.YARNHadoopDelegationTokenManager: Attempting to login to KDC using principal: hive/uf30-1@CDH.COM 2024-06-20 09:43:35,017 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO security.YARNHadoopDelegationTokenManager: Successfully logged into KDC. 2024-06-20 09:43:35,028 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-350949835_1, ugi=hive/uf30-1@CDH.COM (auth:KERBEROS)]] with renewer yarn/uf30-1@CDH.COM 2024-06-20 09:43:35,056 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO hdfs.DFSClient: Created token for hive: HDFS_DELEGATION_TOKEN owner=hive/uf30-1@CDH.COM, renewer=yarn, realUser=, issueDate=1718847815048, maxDate=1719452615048, sequenceNumber=489988, masterKeyId=1416 on ha-hdfs:ns1 2024-06-20 09:43:35,059 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO security.HadoopFSDelegationTokenProvider: getting token for: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-350949835_1, ugi=hive/uf30-1@CDH.COM (auth:KERBEROS)]] with renewer hive/uf30-1@CDH.COM 2024-06-20 09:43:35,060 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO hdfs.DFSClient: Created token for hive: HDFS_DELEGATION_TOKEN owner=hive/uf30-1@CDH.COM, renewer=hive, realUser=, issueDate=1718847815056, maxDate=1719452615056, sequenceNumber=489989, masterKeyId=1416 on ha-hdfs:ns1 2024-06-20 09:43:35,106 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO security.HadoopFSDelegationTokenProvider: Renewal interval is 86400044 for token HDFS_DELEGATION_TOKEN 2024-06-20 09:43:35,168 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 WARN conf.HiveConf: HiveConf of name hive.enforce.bucketing does not exist 2024-06-20 09:43:35,206 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO yarn.Client: Submitting application application_1716544959017_1620 to ResourceManager 2024-06-20 09:43:35,456 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO impl.YarnClientImpl: Submitted application application_1716544959017_1620 2024-06-20 09:43:35,461 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO yarn.Client: Application report for application_1716544959017_1620 (state: ACCEPTED) 2024-06-20 09:43:35,468 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO yarn.Client: 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: client token: Token { kind: YARN_CLIENT_TOKEN, service: } 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: diagnostics: AM container is launched, waiting for AM container to Register with RM 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: ApplicationMaster host: N/A 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: ApplicationMaster RPC port: -1 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: queue: root.users.dap 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: start time: 1718847815223 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: final status: UNDEFINED 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: tracking URL: http://uf30-3:8088/proxy/application_1716544959017_1620/ 2024-06-20 09:43:35,469 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: user: hive 2024-06-20 09:43:35,474 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO util.ShutdownHookManager: Shutdown hook called 2024-06-20 09:43:35,476 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c92c59a3-55ce-4aa1-9463-6ed42b4ddd99 2024-06-20 09:43:35,481 INFO org.apache.hive.spark.client.SparkClientImpl: [spark-submit-stderr-redir-HiveServer2-Background-Pool: Thread-151785]: 24/06/20 09:43:35 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6592d710-76fe-4804-9e14-6fa37e26747c 2024-06-20 09:43:35,865 INFO org.apache.hive.spark.client.SparkClientImpl: [Driver]: Child process (spark-submit) exited successfully. 2024-06-20 09:43:40,157 INFO org.apache.hive.spark.client.SparkClientImpl: [HiveServer2-Background-Pool: Thread-151785]: Successfully connected to Remote Spark Driver at: uf30-1:38642 2024-06-20 09:43:48,053 INFO org.apache.hive.spark.client.SparkClientImpl: [Spark-Driver-RPC-Handler-0]: Received Spark job ID: 0 for client job c111431c-5a57-4bb4-9357-07d0e5948d79
服务端参数 hive.spark.client.server.connect.timeout:默认90秒:Timeout for handshake between Hive client and remote Spark driver. Checked by both processes.
服务端参数 hive.spark.client.future.timeout: 默认 60秒:Timeout for requests from Hive client to remote Spark driver.
客户端参数 hive.spark.client.connect.timeout:默认一秒:Timeout for remote Spark driver in connecting back to Hive client
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。