从2010年开始接触hadoop,到2012年的这段时间内一直从事hadoop相关项目的研发,2012年后,由于一些原因,在技术上的积累慢慢淡忘。今天再次拿起hadoop,看了一下2.4源码,提供的API更多了,ha也有相应的解决方案。yarn后期应该也会发展的更加完美。废话不多说,先测试一些hadoop hdfs的ha。
hdfs的ha,主要的问题是active和standby的元数据信息同步问题,之前的解决方案有avatar等一系列。共享存储可以采用NFS,bookkeeper等相关存储。在这里我们采用Journal来实现共享存储,主要是因为配置简单。
虚拟机准备:三台,列表如下:
机器名 | 功能 | IP |
master1 | namenode(active),JournalNode,zookeeper | 192.168.6.171 |
master2 | namenode,JournalNode,zookeeper | 192.168.6.172 |
datanode1 | datanode,JournalNode,zookeeper | 192.168.6.173 |
软件版本:hadoop 2.4.1 zookeeper3.4.6
下载hadoop2.4.1后,解压,解压zookeeper
第一步先配置zookeeper集群
将zookeeper解压后的文件夹下的conf下的zoo_sample.cfg重命名为zoo.cfg
修改配置
dataDir=/cw/zookeeper/ 我这里修改为/cw/zookeeper/ 确保该文件夹存在
在该文件尾部添加集群配置
server.1=192.168.6.171:2888:3888
server.2=192.168.6.172:2888:3888
server.3=192.168.6.173:2888:3888
将修改后的zookeeper文件夹分发到其他两台机器上
scp -r zookeeper-3.4.6 root@192.168.6.172:/cw/
scp -r zookeeper-3.4.6 root@192.168.6.173:/cw/
配置每台机器的pid
在192.168.6.171机器上执行
echo "1" >> /cw/zookeeper/myid
在192.168.6.172机器上执行
echo "2" >> /cw/zookeeper/myid
在192.168.6.173机器上执行
echo "3" >> /cw/zookeeper/myid
启动zookeeper,每台分别执行
./zkServer.sh start
都启动完成后,可以通过查看日志确认是否启动OK,或者执行 ./zkServer.sh status来查看每一个节点的状态。
---------------------------------------------------华立分割 hadoop开始----------------------------------------------------------------------------配置hadoop的相关参数
hadoop-env.sh主要配置java_home的路径
core-site.xml配置内容如下
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://myhadoop</value>
myhadoop是namespace的id
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>192.168.6.171,192.168.6.172,192.168.6.173</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>myhadoop</value>对应之前的namespace
<description>
Comma-separated list of nameservices.
as same as fs.defaultFS in core-site.xml.
</description>
</property>
<property>
<name>dfs.ha.namenodes.myhadoop</name>
<value>nn1,nn2</value>每一个nn的id编号
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.myhadoop.nn1</name>
<value>192.168.6.171:8020</value>
<description>
RPC address for nomenode1 of hadoop-test
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.myhadoop.nn2</name>
<value>192.168.6.172:8020</value>
<description>
RPC address for nomenode2 of hadoop-test
</description>
</property>
<property>
<name>dfs.namenode.http-address.myhadoop.nn1</name>
<value>192.168.6.171:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.myhadoop.nn2</name>
<value>192.168.6.172:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.servicerpc-address.myhadoop.n1</name>
<value>192.168.6.171:53310</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.myhadoop.n2</name>
<value>192.168.6.172:53310</value>
</property>
下部分为对应的文件存储目录配置
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///cw/hadoop/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsimage). If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://192.168.6.171:8485;192.168.6.172:8485;192.168.6.173:8485/hadoop-journal</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///cw/hadoop/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/cw/hadoop/journal/</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.myhadoop</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/yarn/.ssh/id_rsa</value>
<description>the location stored ssh key</description>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>1000</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>8</value>
</property>
</configuration>
以上所涉及的文件夹需要手工建立,如不存在会出现异常。
以后配置完成后,将配置好的hadoop分发到所有集群节点。同时每一个节点建立需要的文件夹。
下面开始格式化zk节点,执行:./hdfs zkfc -formatZK
执行完毕后,启动ZookeeperFailoverController,用来监控主备节点的状态。
./hadoop-daemon.sh start zkfc 一般在主备节点启动就可以
下一步启动共享存储系统JournalNode
在各个JN节点上启动:hadoop-daemon.sh start journalnode
下一步,在主NN上执行./hdfs namenode -format格式化文件系统
执行完毕后启动主NN./hadoop-daemon.sh start namenode
在备用NN节点先同步NN的元数据信息,执行./hdfs namenode -bootstrapStandby
同步完成后,启动备用NN ./hadoop-daemon.sh start namenode
由于zk已经自动选择了一个节点作为主节点,这里不要手工设置。如想手工设置主备NN可以执行
./hdfs haadmin -transitionToActive nn1
启动所有的datanode
分别打开192.168.6.171:50070和192.168.6.172:50070
可以执行相关的hdfs shell命令来验证集群是否正常工作。
下面来kill掉主节点的NN
kill -9 135415
可以看到已经成功切换。