赞
踩
官网网址:http://spark.apache.org/downloads.html
[potter@potter2 ~]$ tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/
/home/potter/apps/spark-2.3.0-bin-hadoop2.7/conf
- [potter@potter2 conf]$ ll
- total 36
- -rw-r--r-- 1 potter potter 996 Feb 23 03:42 docker.properties.template
- -rw-r--r-- 1 potter potter 1105 Feb 23 03:42 fairscheduler.xml.template
- -rw-r--r-- 1 potter potter 2025 Feb 23 03:42 log4j.properties.template
- -rw-r--r-- 1 potter potter 7801 Feb 23 03:42 metrics.properties.template
- -rw-r--r-- 1 potter potter 865 Feb 23 03:42 slaves.template
- -rw-r--r-- 1 potter potter 1292 Feb 23 03:42 spark-defaults.conf.template
- -rwxr-xr-x 1 potter potter 4221 Feb 23 03:42 spark-env.sh.template
复制spark-env.sh.template,并重命名为spark-env.sh,并在文件最后添加配置内容
- [potter@potter2 conf]$ cp spark-env.sh.template spark-env.sh
- [potter@potter2 conf]$ vi spark-env.sh
- export JAVA_HOME=/usr/local/java/jdk1.8.0_73
- #export SCALA_HOME=/usr/share/scala
- export HADOOP_HOME=/home/potter/apps/hadoop-2.7.5
- export HADOOP_CONF_DIR=/home/potter/apps/hadoop-2.7.5/etc/hadoop
- export SPARK_WORKER_MEMORY=500m
- export SPARK_WORKER_CORES=1
- export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=potter2:2181,potter3:2181,potter4:2181,potter5:2181 -Dspark.deploy.zookeeper.dir=/spark"
注: #export SPARK_MASTER_IP=hadoop1 这个配置要注释掉。 集群搭建时配置的spark参数可能和现在的不一样,主要是考虑个人电脑配置问题,如果memory配置太大,机器运行很慢。 说明: -Dspark.deploy.recoveryMode=ZOOKEEPER #说明整个集群状态是通过zookeeper来维护的,整个集群状态的恢复也是通过zookeeper来维护的。就是说用zookeeper做了spark的HA配置,Master(Active)挂掉的话,Master(standby)要想变成Master(Active)的话,Master(Standby)就要像zookeeper读取整个集群状态信息,然后进行恢复所有Worker和Driver的状态信息,和所有的Application状态信息; -Dspark.deploy.zookeeper.url=potter2:2181,potter3:2181,potter4:2181,potter5:2181#将所有配置了zookeeper,并且在这台机器上有可能做master(Active)的机器都配置进来;(我用了4台,就配置了4台) -Dspark.deploy.zookeeper.dir=/spark 这里的dir和zookeeper配置文件zoo.cfg中的dataDir的区别??? -Dspark.deploy.zookeeper.dir是保存spark的元数据,保存了spark的作业运行状态; zookeeper会保存spark集群的所有的状态信息,包括所有的Workers信息,所有的Applactions信息,所有的Driver信息,如果集群 |
- [potter@potter2 conf]$ cp slaves.template slaves
- [potter@potter2 conf]$ vi slaves
添加以下内容:
- potter2
- potter3
- potter4
- potter5
- [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter3:$PWD
- [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter4:$PWD
- [potter@potter2 apps]$ scp -r spark-2.3.0-bin-hadoop2.7/ potter5:$PWD
[potter@potter2 ~]$ vi .bashrc
- export SPARK_HOME=/home/potter/apps/spark-2.3.0-bin-hadoop2.7
- export PATH=$PATH:$SPARK_HOME/bin
保存并使其立即生效
[potter@potter2 ~]$ source .bashrc
复制一个spark-defaults.conf文件
[potter@potter2 conf]$ cp spark-defaults.conf.template spark-defaults.conf
[potter@potter2 conf]$ vi spark-defaults.conf
- # This is useful for setting default environmental settings.
-
- # Example:
- spark.master spark://potter2:7077,potter3:7077,potter4:7077,potter5:7077
- # spark.eventLog.enabled true
- # spark.eventLog.dir hdfs://namenode:8021/directory
- # spark.serializer org.apache.spark.serializer.KryoSerializer
- # spark.driver.memory 5g
- # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
- [potter@potter2 conf]$ scp -r spark-defaults.conf potter3:$PWD
- [potter@potter2 conf]$ scp -r spark-defaults.conf potter4:$PWD
- [potter@potter2 conf]$ scp -r spark-defaults.conf potter5:$PWD
所有节点均要执行
- [potter@potter2 ~]$ zkServer.sh start
- ZooKeeper JMX enabled by default
- Using config: /home/potter/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg
- Starting zookeeper ... already running as process 3703.
- [potter@potter2 ~]$ zkServer.sh status
- ZooKeeper JMX enabled by default
- Using config: /home/potter/apps/zookeeper-3.4.10/bin/../conf/zoo.cfg
- Mode: follower
任意一个节点执行即可
[potter@potter2 ~]$ start-dfs.sh
- [potter@potter2 ~]$ cd apps/spark-2.3.0-bin-hadoop2.7/sbin/
- [potter@potter2 sbin]$ ./start-all.sh
- [potter@potter2 sbin]$ jps
- 6464 Master
- 6528 Worker
- 6561 Jps
- 6562 Jps
- 3909 NameNode
- 6565 Jps
- 3703 QuorumPeerMain
- 5047 NodeManager
- 4412 DFSZKFailoverController
- 4204 JournalNode
- 4014 DataNode
- [potter@potter3 conf]$ jps
- 4609 Jps
- 3441 DataNode
- 3284 QuorumPeerMain
- 4581 Worker
- 3879 NodeManager
- 3576 JournalNode
- 3372 NameNode
- 3676 DFSZKFailoverController
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。