当前位置:   article > 正文

Hadoop的HA搭建遇见的两个坑_warn org.apache.hadoop.hdfs.server.namenode.fsedit

warn org.apache.hadoop.hdfs.server.namenode.fseditlog: unable to determine i

坑一:Namenode有一个无法启动

Unable to determine input streams from QJM to [192.168.98.166:8485, 192.168.98.167:8485, 192.168.98.]

2021-03-11 21:16:30,478 WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input streams from QJM to [192.168.98.166:8485, 192.168.98.167:8485, 192.168.98.168:8485]. Skipping.
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
	at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
	at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:471)
	at org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:278)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1508)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1532)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:652)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:812)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:796)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

在namenode启动之后,过一段时间namenode就死掉了。默认情况下namenode启动10s(maxRetries=10, sleepTime=1000)后journalnode还没有启动,就会报上述错误。

处理方法:

hdfs-site.xml中添加如下配置:

<!--修改core-site.xml中的ipc参数,防止出现连接journalnode服务ConnectException-->
<property>
    <name>ipc.client.connect.max.retries</name>
    <value>100</value>
    <description>Indicates the number of retries a client will make to establish a server connection.</description>
</property>
<property>
    <name>ipc.client.connect.retry.interval</name>
    <value>10000</value>
    <description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection.</description>
</property>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

坑二:自动故障转移失败

Hadoop HA集群 NameNode 无法自动故障转移(切换active)

在学习 HA 自动化配置,按照hadoop官网配置,最后所有的节点都启动正常。

用 kill -9 进程号 杀死了当前处于active状态的NameNode后,其他的 Standby 状态的NameNode 并没有自动切换为 Active状态,而且重启杀死的 NameNode 后,可能出现所有NameNode节点都变成了 Standby状态的情况,一个Active状态的都没有。

查看日志:${HADOOP_HOME}/logs/hadoop-root-zkfc-hadoop2.log ,发现报错了:

2020-01-03 19:21:13,636 WARN org.apache.hadoop.ha.FailoverController: Unable to gracefully make NameNode at hadoop3/192.168.233.13:8020 standby (unable to connect)
java.net.ConnectException: Call From hadoop2/192.168.233.12 to hadoop3:8020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
.........................
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

这是因为没有 fuster 程序,导致无法进行 fence,根据官网上的配置,是在 hdfs-site.xml 中配置过相关配置:

<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>
  • 1
  • 2
  • 3
  • 4

解决方法:安装包含fuster程序的软件包Psmisc(每个机器上都要安装):

yum -y install psmisc
  • 1

然后重启整个HA集群,进行测试,问题解决

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/394501
推荐阅读
相关标签
  

闽ICP备14008679号