赞
踩
传统的Hadoop集群从,由于只存在一个NameNode,一旦这个唯一的NameNode发生故障,就会导致整个集群变得不可用,也就是常说的“单点故障问题”,虽然存在Secondary NameNode,但是Secondary NameNode并不是NameNode的备用节点。在HDFS HA中,通过配置多个 NameNodes(Active/Standby)实现在集群中对 NameNode 的热备来解决上述问题。如果出现故障,如机器崩溃或机器需要升级维护,这时可通过此种方式将 NameNode 很快的切换到另外一台机器。
当前HDFS集群规划
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | Secondary NameNode | |
DataNode | DataNode | DataNode |
HA 的主要目的是消除 namenode 的单点故障,需要将 HDFS 集群规划成以下模样
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | NameNode | NameNode |
DataNode | DataNode | DataNode |
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | NameNode | NameNode |
DataNode | DataNode | DataNode |
JournalNode | JournalNode | JournalNode |
1. 在 opt 目录下创建一个hadoopHA文件夹
[leon@hadoop102 ~]$ cd /opt
[leon@hadoop102 opt]$ sudo mkdir hadoopHA
[leon@hadoop102 opt]$ sudo chown leon:leon /opt/hadoopHA # 这里给用户赋权限
2. 将/opt/module/下的 hadoop-3.1.3 拷贝到/opt/hadoopHA 目录下,并删除data和log目录
[leon@hadoop102 opt]$ cp -r /opt/module/hadoop-3.1.3 /opt/hadoopHA/
3. 配置core-site.xml
<configuration>
<!-- 把多个 NameNode 的地址组装成一个集群 mycluster -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 指定 hadoop 运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/HadoopHA/hadoop-3.1.3/data</value>
</property>
</configuration>
4. 配置hdfs-site.xml(注意:在最后ssh部分将用户名改成自己的)
<configuration> <!-- NameNode 数据存储目录 --> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/name</value> </property> <!-- DataNode 数据存储目录 --> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/data</value> </property> <!-- JournalNode 数据存储目录 --> <property> <name>dfs.journalnode.edits.dir</name> <value>${hadoop.tmp.dir}/jn</value> </property> <!-- 完全分布式集群名称 --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- 集群中 NameNode 节点都有哪些 --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2,nn3</value> </property> <!-- NameNode 的 RPC 通信地址 --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>hadoop102:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>hadoop103:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn3</name> <value>hadoop104:8020</value> </property> <!-- NameNode 的 http 通信地址 --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>hadoop102:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>hadoop103:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn3</name> <value>hadoop104:9870</value> </property> <!-- 指定 NameNode 元数据在 JournalNode 上的存放位置 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value> </property> <!-- 访问代理类:client 用于确定哪个 NameNode 为 Active --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 使用隔离机制时需要 ssh 秘钥登录--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/leon/.ssh/id_rsa</value> </property> </configuration>
[leon@hadoop102 opt]$ xsync hadoopHA
[leon@hadoop102 ~]$ sudo vim /etc/profile.d/my_env.sh
#HADOOP_HOME
export HADOOP_HOME=/opt/ha/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
[leon@hadoop102 ~]$source /etc/profile
[leon@hadoop102 ~]$ hdfs --daemon start journalnode
[leon@hadoop103 ~]$ hdfs --daemon start journalnode
[leon@hadoop104 ~]$ hdfs --daemon start journalnode
[leon@hadoop102 ~]$ hdfs namenode -format
[leon@hadoop102 ~]$ hdfs --daemon start namenode
[leon@hadoop103 ~]$ hdfs namenode -bootstrapStandby
[leon@hadoop104 ~]$ hdfs namenode -bootstrapStandby
[leon@hadoop103 ~]$ hdfs --daemon start namenode
[leon@hadoop104 ~]$ hdfs --daemon start namenode
[leon@hadoop102 ~]$ hdfs --daemon start datanode
[leon@hadoop103 ~]$ hdfs --daemon start datanode
[leon@hadoop104 ~]$ hdfs --daemon start datanode
[leon@hadoop102 ~]$ hdfs haadmin -transitionToActive nn1
11.查看是否Active
[leon@hadoop102 ~]$ hdfs haadmin -getServiceState nn1
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | NameNode | NameNode |
DataNode | DataNode | DataNode |
JournalNode | JournalNode | JournalNode |
Zookeeper | Zookeeper | Zookeeper |
ZKFC | ZKFC | ZKFC |
<!-- 启用 nn 故障自动转移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 指定 zkfc 要连接的 zkServer 地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
[leon@hadoop102 etc]$ xsync hadoop/ # 在/opt/ha/hadoop-3.1.3/etc目录下
[leon@hadoop102 ~]$ stop-dfs.sh
[leon@hadoop102 ~]$ zkServer.sh start
[leon@hadoop103 ~]$ zkServer.sh start
[leon@hadoop104 ~]$ zkServer.sh start
[leon@hadoop102 ~]$ hdfs zkfc -formatZK
[leon@hadoop102 ~]$ start-dfs.sh
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 启用 resourcemanager ha --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 声明两台 resourcemanager 的地址 --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <!--指定 resourcemanager 的逻辑列表--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2,rm3</value> </property> <!-- ========== rm1 的配置 ========== --> <!-- 指定 rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hadoop102</value> </property> <!-- 指定 rm1 的 web 端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hadoop102:8088</value> </property> <!-- 指定 rm1 的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm1</name> <value>hadoop102:8032</value> </property> <!-- 指定 AM 向 rm1 申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>hadoop102:8030</value> </property> <!-- 指定供 NM 连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm1</name> <value>hadoop102:8031</value> </property> <!-- ========== rm2 的配置 ========== --> <!-- 指定 rm2 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hadoop103</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hadoop103:8088</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>hadoop103:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>hadoop103:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>hadoop103:8031</value> </property> <!-- ========== rm3 的配置 ========== --> <!-- 指定 rm1 的主机名 --> <property> <name>yarn.resourcemanager.hostname.rm3</name> <value>hadoop104</value> </property> <!-- 指定 rm1 的 web 端地址 --> <property> <name>yarn.resourcemanager.webapp.address.rm3</name> <value>hadoop104:8088</value> </property> <!-- 指定 rm1 的内部通信地址 --> <property> <name>yarn.resourcemanager.address.rm3</name> <value>hadoop104:8032</value> </property> <!-- 指定 AM 向 rm1 申请资源的地址 --> <property> <name>yarn.resourcemanager.scheduler.address.rm3</name> <value>hadoop104:8030</value> </property> <!-- 指定供 NM 连接的地址 --> <property> <name>yarn.resourcemanager.resource-tracker.address.rm3</name> <value>hadoop104:8031</value> </property> <!-- 指定 zookeeper 集群的地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value> </property> <!-- 启用自动恢复 --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- 指定 resourcemanager 的状态信息存储在 zookeeper 集群 --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value> JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME </value> </property> </configuration>
[leon@hadoop102 ~]$ start-yarn.sh # 在 hadoop102或者hadoop103中启动
[leon@hadoop102 ~]$ yarn rmadmin -getServiceState rm1
hadoop102 | hadoop103 | hadoop104 |
---|---|---|
NameNode | NameNode | NameNode |
DataNode | DataNode | DataNode |
JournalNode | JournalNode | JournalNode |
Zookeeper | Zookeeper | Zookeeper |
ZKFC | ZKFC | ZKFC |
ResourceManager | ResourceManager | ResourceManager |
NodeManager | NodeManager | NodeManager |
以上就是今天要讲的内容,本文仅仅简单介绍了Hadoop HA集群的安装配置,是在Hadoop集群的基础上搭建的。本人太懒了,HA都写好了,Hadoop安装与部署还没补上/(ㄒoㄒ)/~~,这个坑会在后续填上。如果这篇文章对你有帮助的话,可以动动手指一键三连(bushi),谢谢同志们
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。