当前位置:   article > 正文

Spark学习(二)Spark高可用集群搭建

spark高可用集群搭建

1、下载Spark安装包

官网网址:http://spark.apache.org/downloads.html


2、Spark安装过程

2.1、上传并解压缩

[potter@potter2 ~]$ tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/

2.2、修改配置文件

(1)进入配置文件所在目录

/home/potter/apps/spark-2.3.0-bin-hadoop2.7/conf

  1. [potter@potter2 conf]$ ll
  2. total 36
  3. -rw-r--r-- 1 potter potter 996 Feb 23 03:42 docker.properties.template
  4. -rw-r--r-- 1 potter potter 1105 Feb 23 03:42 fairscheduler.xml.template
  5. -rw-r--r-- 1 potter potter 2025 Feb 23 03:42 log4j.properties.template
  6. -rw-r--r-- 1 potter potter 7801 Feb 23 03:42 metrics.properties.template
  7. -rw-r--r-- 1 potter potter 865 Feb 23 03:42 slaves.template
  8. -rw-r--r-- 1 potter potter 1292 Feb 23 03:42 spark-defaults.conf.template
  9. -rwxr-xr-x 1 potter potter 4221 Feb 23 03:42 spark-env.sh.template
(2)修改spark-env.sh文件

复制spark-env.sh.template,并重命名为spark-env.sh,并在文件最后添加配置内容

  1. [potter@potter2 conf]$ cp spark-env.sh.template spark-env.sh
  2. [potter@potter2 conf]$ vi spark-env.sh
  1. export JAVA_HOME=/usr/local/java/jdk1.8.0_73
  2. #export SCALA_HOME=/usr/share/scala
  3. export HADOOP_HOME=/home/potter/apps/hadoop-2.7.5
  4. export HADOOP_CONF_DIR=/home/potter/apps/hadoop-2.7.5/etc/hadoop
  5. export SPARK_WORKER_MEMORY=500m
  6. export SPARK_WORKER_CORES=1
  7. export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=potter2:2181,potter3:2181,potter4:2181,potter5:2181 -Dspark.deploy.zookeeper.dir=/spark"
注:

#export SPARK_MASTER_IP=hadoop1  这个配置要注释掉。 

集群搭建时配置的spark参数可能和现在的不一样,主要是考虑个人电脑配置问题,如果memory配置太大,机器运行很慢。 

说明: 

-Dspark.deploy.recoveryMode=ZOOKEEPER    #说明整个集群状态是通过zookeeper来维护的,整个集群状态的恢复也是通过zookeeper来维护的。就是说用zookeeper做了spark的HA配置,Master(Active)挂掉的话,Master(standby)要想变成Master(Active)的话,Master(Standby)就要像zookeeper读取整个集群状态信息,然后进行恢复所有Worker和Driver的状态信息,和所有的Application状态信息; 

-Dspark.deploy.zookeeper.url=potter2:2181,potter3:2181,potter4:2181,potter5:2181#将所有配置了zookeeper,并且在这台机器上有可能做master(Active)的机器都配置进来;(我用了4台,就配置了4台) 

-Dspark.deploy.zookeeper.dir=/spark 

这里的dir和zookeeper配置文件zoo.cfg中的dataDir的区别??? 

-Dspark.deploy.zookeeper.dir是保存spark的元数据,保存了spark的作业运行状态; 

zookeeper会保存spark集群的所有的状态信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群
(3)复制slaves.template变成slaves
  1. [potter@potter2 conf]$ cp slaves.template slaves
  2. [potter@potter2 conf]$ vi slaves

添加以下内容:

  1. potter2
  2. potter3
  3. potter4
  4. potter5
(4)将安装包分发给其他节点
  1. [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter3:$PWD
  2. [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter4:$PWD
  3. [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter5:$PWD

2.3、配置环境变量

[potter@potter2 ~]$ vi .bashrc
  1. export SPARK_HOME=/home/potter/apps/spark-2.3.0-bin-hadoop2.7
  2. export PATH=$PATH:$SPARK_HOME/bin

保存并使其立即生效

[potter@potter2 ~]$ source .bashrc

2.4、配置spark-defaults.conf

Spark默认是本地模式:
进入/home/potter/apps/spark-2.3.0-bin-hadoop2.7/conf

复制一个spark-defaults.conf文件

[potter@potter2 conf]$ cp spark-defaults.conf.template spark-defaults.conf
[potter@potter2 conf]$ vi spark-defaults.conf
  1. # This is useful for setting default environmental settings.
  2. # Example:
  3. spark.master spark://potter2:7077,potter3:7077,potter4:7077,potter5:7077
  4. # spark.eventLog.enabled true
  5. # spark.eventLog.dir hdfs://namenode:8021/directory
  6. # spark.serializer org.apache.spark.serializer.KryoSerializer
  7. # spark.driver.memory 5g
  8. # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

分发到其他节点:
  1. [potter@potter2 conf]$ scp -r spark-defaults.conf potter3:$PWD
  2. [potter@potter2 conf]$ scp -r spark-defaults.conf potter4:$PWD
  3. [potter@potter2 conf]$ scp -r spark-defaults.conf potter5:$PWD

3、启动

3.1、先启动zookeeper集群

所有节点均要执行

  1. [potter@potter2 ~]$ zkServer.sh start
  2. ZooKeeper JMX enabled by default
  3. Using config: /home/potter/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg
  4. Starting zookeeper ... already running as process 3703.
  5. [potter@potter2 ~]$ zkServer.sh status
  6. ZooKeeper JMX enabled by default
  7. Using config: /home/potter/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg
  8. Mode: follower

3.2、启动HDFS集群

任意一个节点执行即可

[potter@potter2 ~]$ start-dfs.sh

3.3、再启动Spark集群

  1. [potter@potter2 ~]$ cd apps/spark-2.3.0-bin-hadoop2.7/sbin/
  2. [potter@potter2 sbin]$ ./start-all.sh

3.4、查看进程

  1. [potter@potter2 sbin]$ jps
  2. 6464 Master
  3. 6528 Worker
  4. 6561 Jps
  5. 6562 Jps
  6. 3909 NameNode
  7. 6565 Jps
  8. 3703 QuorumPeerMain
  9. 5047 NodeManager
  10. 4412 DFSZKFailoverController
  11. 4204 JournalNode
  12. 4014 DataNode
  1. [potter@potter3 conf]$ jps
  2. 4609 Jps
  3. 3441 DataNode
  4. 3284 QuorumPeerMain
  5. 4581 Worker
  6. 3879 NodeManager
  7. 3576 JournalNode
  8. 3372 NameNode
  9. 3676 DFSZKFailoverController

                
声明:本文内容由网友自发贡献,转载请注明出处:【wpsshop】
推荐阅读
相关标签
  

闽ICP备14008679号