赞
踩
hadoo系列(一)hadoop集群安装 | |
hadoop系列(二)HA高可用方式部署 | |
hadoop系列(三) HDFS的shell操作和常用API操作 | |
hadoop系列(四)HDFS的工作机制,MapReduce,yarn流程及核心原理 | |
hadoop系列(五) 源码分析 |
目录
1.1 下载zookeeper安装包zookeeper.2.4.6.tar
2.2.3 修改slaves文件,代表dataNode的是哪些节点
2.3 在三台机器上node1,node2,node4 启动journalnode
主节点(nameNode节点)与其他从节点具备ssh免密登录
环境
系统 | zookeeper | hadoop |
centos7.5(四台) | 2.4.6(集群三个节点) | 2.8.5 |
HOST | IP | NameNode | JN | SNN | DN | ZKFC | ZK |
node01 | 192.168.1.201 | √ | √ | √ | |||
node02 | 192.168.1.202 | √ | √ | √ | √ | √ | √ |
node04 | 192.168.1.204 | √ | √ | √ | |||
node05 | 192.168.1.205 | √ | √ |
部署步骤:
准备zk的数据目录,在node2上执行:
- cd /usr/local/bigdata
-
- wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
- mkdir zk
- cd zk/
- echo 1 > myid
- # 其他节点 根据zookeeper的配置文件 配置不同的myid
- dataDir=/usr/local/bigdata/zk
- server.1=192.168.1.102:2888:3888
- server.2=192.168.1.104:2888:3888
- server.3=192.168.1.105:2888:3888
把文件cope到node4和node5,而且对应的也要新增zk目录
- # 在node4上新增mypid
- echo 4 > myid
-
- # 在node5上新增mypid
- echo 5 > myid
- scp -r zookeeper-3.4.6 node4:`pwd`
- scp -r zookeeper-3.4.6 node5:`pwd`
/usr/local/bigdata/zookeeper-3.4.6/bin/zkServer.sh start /usr/local/bigdata/zookeeper-3.4.6/bin/conf/zoo.cfg
修改hadoop-env.sh
- export JAVA_HOME=/usr/local/jdk1.8.0_301
- export HADOOP_CONF_DIR=/usr/local/bigdata/hadoop-ha/etc/hadoop
- mkdir -p /home/hadoop/ha/hdfs/data
- mkdir -p /home/hadoop/ha/hdfs/name
core-site.xml
hdfs-site.xml
slaves
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://mycluster</value>
- </property>
-
- <property>
- <name>ha.zookeeper.quorum</name>
- <value>node2:2181,node4:2181,node5:2181</value>
- </property>
-
- <property>
- <name>hadoop.home.dir</name>
- <value>file:/usr/local/bigdata/hadoop-ha</value>
- <description>Abase for other temporary directories.</description>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>file:/home/hadoop/ha/hdfs</value>
- </property>
- </configuration>
- <configuration>
-
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:/home/hadoop/ha/hdfs/name</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:/home/hadoop/ha/hdfs/data</value>
- </property>
-
- <property>
- <name>dfs.hosts</name>
- <value>/usr/local/bigdata/hadoop-ha/etc/hadoop/slaves</value>
- </property>
-
- <!-- 一对多,逻辑配置 -->
- <property>
- <name>dfs.nameservices</name>
- <value>mycluster</value>
- </property>
-
- <property>
- <name>dfs.ha.namenodes.mycluster</name>
- <value>nn1,nn2</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.mycluster.nn1</name>
- <value>node1:8020</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.mycluster.nn2</name>
- <value>node2:8020</value>
- </property>
-
- <property>
- <name>dfs.namenode.http-address.mycluster.nn1</name>
- <value>node1:50070</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.mycluster.nn2</name>
- <value>node2:50070</value>
- </property>
-
- <!-- qjournal:相关配置 JN在哪里启动,数据存储在哪个磁盘的位置 -->
- <property>
- <name>dfs.namenode.shared.edits.dir</name>
- <value>qjournal://node1:8485;node2:8485;node4:8485/mycluster</value>
- </property>
-
- <property>
- <name>dfs.journalnode.edits.dir</name>
- <value>/usr/local/bigdata/journal/data</value>
- </property>
- <!-- HA角色切换的代理类,和实现方法,使用的免密的相关配置 -->
- <property>
- <name>dfs.client.failover.proxy.provider.mycluster</name>
-
- <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
- </property>
- <property>
- <name>dfs.ha.fencing.methods</name>
- <value>sshfence</value>
- </property>
- <property>
- <name>dfs.ha.fencing.ssh.private-key-files</name>
- <value>/home/hadoop/.ssh/authorized_keys</value>
- </property>
-
-
- <!--开启HA;开启自动化 启动zkfc-->
- <property>
- <name>dfs.ha.automatic-failover.enabled</name>
- <value>true</value>
- </property>
-
- </configuration>
- node5
- node2
- node4
分发修改完配置的hadoop文件:
- scp -r hadoop-ha/ node2:`pwd`
-
- scp -r hadoop-ha/ node4:`pwd`
- scp -r hadoop-ha/ node5:`pwd`
- cd /sbin
-
- ./hadoop-daemon.sh start journalnode
在三台机器山执行jps查看是否有JournalNode进程,并且查看日志文件是否有报错
hadoop-hadoop-journalnode-node*.log
- cd bin/
- ./hdfs namenode -format
此时在node1执行了format,就不要在其他机器执行了,而且也仅仅在集群部署的第一次执行,之后也不再执行
在刚刚的机器上启动namenode
./hadoop-daemon.sh start namenode
jps验证进程
第二台namenode机器:
- cd bin/
- ./hdfs namenode -bootstrapStandby
执行同步文件原数据的命令,把信息同步过来
./hdfs zkfc -formatZK
查看zookeeper的数据,会多出此目录
- cd sbin
- ./start-dfs.sh
验证hdfs:
http://192.168.1.201:50070/dfshealth.html#tab-overviewhttp://192.168.1.202:50070/dfshealth.html#tab-overview
node1 192.168.1.101为active
node2 192.168.1.102为standby
使用hdfs 创建目录 上传文件:
./hdfs dfs -mkdir -p /usr/test
停止hadoop:
使用./stop-dfs.sh 即可
resourceManager
nodeManager
HOST | IP | NameNode | JN | SNN | DN | ZKFC | ZK | resourManager | nodeManager |
node01 | 192.168.1.201 | √ | √ | √ | |||||
node02 | 192.168.1.202 | √ | √ | √ | √ | √ | √ | * | |
node04 | 192.168.1.204 | √ | √ | √ | * | * | |||
node05 | 192.168.1.205 | √ | √ | * | * |
在主节点node1下
mapred-site.xml
yarn-site.xml
分发配置文件
- scp -r mapred-site.xml yarn-site.xml node2:`pwd`
-
- scp -r mapred-site.xml yarn-site.xml node4:`pwd`
- scp -r mapred-site.xml yarn-site.xml node5:`pwd`
nodeManager也是由slaves文件来控制的
启动yarn:
./sbin/start-yarn.sh
此时在node1上启动的resourcemanager是没有此进程的,日志也是报错的,咱们要启动的是resourcemanager是在node4和node5上,这个起来后发现配置文件是,没有配置node1的就会把这个进程杀死
在node4和Node5上启动
./sbin/yarn-daemon.sh start resourcemanager
验证:
jps验证node4和node5上的resourcemanager进程
通过在zookeeper上的数据验证
http://192.168.1.204:8088/cluster/nodes进入后点击About
http://192.168.1.205:8088/cluster/cluster
在node4显示是active的节点,在node5显示是STARTED的节点
使用官方demo验证
- cd usr/local/bigdata/hadoop-ha/share/hadoop/mapreduce
-
- /usr/local/bigdata/hadoop-ha/bin/hdfs dfs -mkdir -p /data/in
- /usr/local/bigdata/hadoop-ha/bin/hdfs dfs -mkdir -p /data/out
-
-
- /usr/local/bigdata/hadoop-ha/bin/hdfs dfs -put ./testWordCount.txt /data/in
- /usr/local/bigdata/hadoop-ha/bin/hdfs dfs -ls /data/in
-
-
- /usr/local/bigdata/hadoop-ha/bin/hadoop jar hadoop-mapreduce-examples-2.8.5.jar wordcount /data/in /data/out/result
-
执行
/usr/local/bigdata/hadoop-ha/bin/hadoop jar hadoop-mapreduce-examples-2.8.5.jar wordcount /data/in /data/out/result
/usr/local/bigdata/hadoop-ha/bin/hdfs dfs -ls /data/out/result
usr/local/bigdata/hadoop-ha/bin/hdfs dfs -cat /data/out/result/part-r-00000
配置完成,关闭集群:
node01: stop-dfs.sh
node01: stop-yarn.sh (停止nodemanager)
node03,node04: yarn-daemon.sh stop resourcemanager
node02、node03、node04:zkServer.sh stop
启动:
- #1. 启动zookeeper集群 node2 node4 node5
- /usr/local/bigdata/zookeeper-3.4.13/bin/zkServer.sh start /usr/local/bigdata/zookeeper-3.4.13/conf/zoo.cfg
-
- #node01:
- /usr/local/bigdata/hadoop-ha/sbin/start-dfs.sh
-
- #node01:
- /usr/local/bigdata/hadoop-ha/sbin/start-yarn.sh
-
- #node04,node05:
- source /home/hadoop/.bash_profile
-
- cd /usr/local/bigdata/hadoop-ha/sbin/
-
- ./yarn-daemon.sh start resourcemanager
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。