当前位置:   article > 正文

spark集成hadoop_spark hadoop集成

spark hadoop集成

hadoop环境搭建请参考hadoop3.2.2集群搭建

环境

centos7、jdk1.8.0_311、scala-2.12.15、zookeeper-3.6.3、hadoop3.2.2、spark-3.2.1-bin-hadoop3.2

spark配置

  1. 配置${SPARK_HOME}/conf/spark-defaults.conf,添加如下内容:
spark.serializer                   org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled             true
spark.eventLog.dir                 hdfs://vmcluster/spark-history
spark.eventLog.compress            true
spark.yarn.historyServer.address   node-3:18080
spark.history.ui.port              18080
spark.history.fs.logDirectory      hdfs://vmcluster/spark-history
spark.history.retainedApplications 10
spark.history.fs.update.interval   5s
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

注意:将spark-defaults.conf.template文件名修改为spark-defaults.conf

  1. 配置${SPARK_HOME}/conf/spark-env.sh,添加如下内容:
export JAVA_HOME=/home/bigdata/env/jdk1.8.0_311
export SCALA_HOME=/home/bigdata/env/scala-2.12.15
export SPARK_HOME=/home/bigdata/env/spark-3.2.1-bin-hadoop3.2
export SPARK_CONF=${SPARK_HOME}/conf
export HADOOP_HOME=/home/bigdata/env/hadoop-3.2.2
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

注意:将spark-env.sh.template文件名修改为spark-env.sh

启动historyserver

start-history-server.sh
  • 1

测试

提交spark自带的SparkPi进行测试,提交命令如下:

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--num-executors 1 \
--executor-memory 512m \
--executor-cores 1 \
--queue bigdata \
${SPARK_HOME}/examples/jars/spark-examples*.jar \
100
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

注意:配置spark的SPARK_HOME系统环境变量。
由于是cluster模式提交任务,结果不会输出到控制台。控制台日志输出如下:

2022-03-16 10:43:41,387 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-03-16 10:43:41,784 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
2022-03-16 10:43:42,334 INFO conf.Configuration: resource-types.xml not found
2022-03-16 10:43:42,335 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-03-16 10:43:42,357 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2022-03-16 10:43:42,358 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
2022-03-16 10:43:42,358 INFO yarn.Client: Setting up container launch context for our AM
2022-03-16 10:43:42,359 INFO yarn.Client: Setting up the launch environment for our AM container
2022-03-16 10:43:42,367 INFO yarn.Client: Preparing resources for our AM container
2022-03-16 10:43:42,487 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2022-03-16 10:43:43,802 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_libs__7226558732161014901.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_libs__7226558732161014901.zip
2022-03-16 10:43:56,526 INFO yarn.Client: Uploading resource file:/home/bigdata/env/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/spark-examples_2.12-3.2.1.jar
2022-03-16 10:43:57,009 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_conf__3589752284083344005.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_conf__.zip
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing modify acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: Changing modify acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bigdata); groups with view permissions: Set(); users  with modify permissions: Set(bigdata); groups with modify permissions: Set()
2022-03-16 10:43:57,254 INFO yarn.Client: Submitting application application_1647396476966_0002 to ResourceManager
2022-03-16 10:43:57,515 INFO impl.YarnClientImpl: Submitted application application_1647396476966_0002
2022-03-16 10:43:58,520 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:43:58,522 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:43:59,527 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:00,537 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:01,548 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:02,555 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:03,557 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:04,562 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:05,564 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:06,574 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:07,588 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:08,595 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:09,605 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:09,605 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:10,617 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:11,630 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:12,643 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:13,653 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:14,658 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:15,667 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:16,709 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:17,722 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:18,727 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:19,730 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:20,737 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:21,749 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:22,752 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:23,760 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:24,782 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:25,791 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:26,793 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:27,803 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:28,809 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:29,822 INFO yarn.Client: Application report for application_1647396476966_0002 (state: FINISHED)
2022-03-16 10:44:29,823 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: SUCCEEDED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:29,843 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-16 10:44:29,844 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82
2022-03-16 10:44:29,848 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-35dc976c-c371-4888-acc8-25e3a44d60a5
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85

yarn web ui

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

yarn web ui 跳转到 spark web ui

在这里插入图片描述
在这里插入图片描述

还是比较简单,就不过多赘述。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/很楠不爱3/article/detail/685975
推荐阅读
相关标签
  

闽ICP备14008679号