当前位置:   article > 正文

CDH环境中NodeManager无法启动,ResourceManager无法启动_hadoop yarn端口 address already in use: bind

hadoop yarn端口 address already in use: bind

CDH环境中NodeManager无法启动,ResourceManager无法启动

1.NodeManager无法启动可能产生的原因

1.1 可能是在该nodemanager停止的时候,向集群中新添加了其他的nodemanager,导致nodemanager启动的时候校验不通过

可能抛出的错误信息
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/xxxxx.sst

2022-05-05 11:24:11,415 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
	at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:281)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:354)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst
	at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
	at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
	at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1517)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1504)
	at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:342)
	at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
	... 5 more

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

解决方案:删除该nodemanager所在机器的 /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state 文件夹下的全部信息

rm -rf /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/*
  • 1

启动之前查看一下8041端口是否被占用,没有信息就是没占用,占用的话如果是nodemanager进程就kill掉,如果是其他进程建议就看一下是谁占用的,看能不能关掉或者是为nodemanager换一个端口。搜索配置yarn.nodemanager.address更改默认端口

lsof -i:8041
  • 1

在这里插入图片描述

1.2 可能是启动端口被占用

可能抛出的错误信息:
java.net.BindException: Address already in use;

INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8041] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
  • 1

解决方案:参考本文1.1,查看yarn.nodemanager.address的端口是否被占用
然后在CDH界面重启相应的NodeManager

2.ResourceManager无法启动

这里我遇到的错误如下,都是端口被占用的错误,解决方案可参考前文

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8031] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
  • 1

3. 如何查看CDH的日志

3.1 在页面上查看相关服务的日志,这个有时因为服务自身的原因可能查看不了。

在这里插入图片描述

3.2 在服务器查看日志文件

登录到需要查看的服务所在的机器上

# CDH安装的服务的日志文件大都在这里
[root@slave01 ~]# cd /var/log/
[root@slave01 log]# ll
total 3160
......
drwxrwxr-x  3 hdfs         hadoop             4096 May  5 14:29 hadoop-hdfs
drwxrwxr-x  3 yarn         hadoop             4096 May  5 14:31 hadoop-yarn
......
# 前文的Nodemanager属于yarn范畴,所以这里可以进入hadoop-yarn
[root@slave01 log]# cd hadoop-yarn/
[root@slave01 hadoop-yarn]# ll
total 2868
-rw-r--r-- 1 yarn yarn   2925441 May  5 14:31 hadoop-cmf-yarn-NODEMANAGER-slave01.log.out
-rw-r--r-- 1 yarn yarn         0 May  4 12:59 SecurityAuth-yarn.audit
drwxr-xr-x 2 yarn hadoop    4096 May  4 14:11 stacks
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15

在 /var/log/hadoop-yarn中可以看到名为*NODEMANAGER*的日志文件,查看该日志文件即可看到具体是因为什么原因抛出错误,然后对症下药。如果是查看其他服务日志,都可以通过对应服务的日志文件的名称找到。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/正经夜光杯/article/detail/865822
推荐阅读
相关标签
  

闽ICP备14008679号