当前位置:   article > 正文

zookeeper delete yarn rmstore

zookeeper delete yarn rmstore

cdh6.2仍然无法解决zookeeper注册信息过期造成yarn的主备脑裂问题。
摘录别人的文章:https://sukbeta.github.io/zookeeper-delete-yarn-rmstore/

错误日志(日志可能会有不同,现象就是无法手动切换主备)

2019-06-16 21:07:39,030 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=yarn OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=Users [yarn] and members of the groups [<users>] are allowed
2019-06-16 21:07:39,030 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
  at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:124)
  at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:812)
  at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:417)
  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
  at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122)
  ... 4 more
Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFencedException: RMStateStore has been fenced
  at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:204)
  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:577)
  at org.apache.hadoop.service.Abstr
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

解决办法:

我们需要删除yarn在ZK上的 rmstore 信息, 之后重启yarn,就可以了。

但是在删除zk上 rmstore 信息的时候, 遇到了问题, yarn在注册时候的时候自己添加上ACL。所以我们直接删除是不行的。

但我们可以可以重新设置一个ACL,就可以了, 如下:

cd $ZOOKEEPER_HOME/bin
./zkCli.sh   # 连接hadoop配置的zk,如果是客户端需添加 -server IP:port
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper, hadoop-ha, hbase, rmstore]
[zk: localhost:2181(CONNECTED) 1] rmr /rmstore
Authentication is not valid : /rmstore/ZKRMStateRoot/RMVersionNode
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

我们可以看一下这个目录的ACL

[zk: localhost:2181(CONNECTED) 2] getAcl /rmstore/ZKRMStateRoot
'world,'anyone
: rwa
'digest,'shining-namenode01.host.com:yelhKlz39YVCV9p4NTModoBq9fw=
: cd
  • 1
  • 2
  • 3
  • 4
  • 5

我们重新设置ACL,并删除目录

[zk: localhost:2181(CONNECTED) 3] setAcl /rmstore/ZKRMStateRoot world:anyone:rwcda
cZxid = 0x10000001b
ctime = Mon May 27 14:58:45 CST 2019
mZxid = 0x10000001b
mtime = Mon May 27 14:58:45 CST 2019
pZxid = 0x10016efd3
cversion = 380191
dataVersion = 0
aclVersion = 5
ephemeralOwner = 0x0
dataLength = 0
numChildren = 5
[zk: localhost:2181(CONNECTED) 4] getAcl /rmstore/ZKRMStateRoot                   
'world,'anyone
: cdrwa
[zk: localhost:2181(CONNECTED) 6] rmr /rmstore/ZKRMStateRoot
[zk: localhost:2181(CONNECTED) 7] ls /
[zookeeper, hadoop-ha, hbase, rmstore]
[zk: localhost:2181(CONNECTED) 8] rmr /rmstore
[zk: localhost:2181(CONNECTED) 9] ls /
[zookeeper, hadoop-ha, hbase]
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

之后重新启动yarn,让yarn重新在zk上注册就可以了。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/394506
推荐阅读
相关标签
  

闽ICP备14008679号