赞
踩
可能抛出的错误信息
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/xxxxx.sst
2022-05-05 11:24:11,415 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:281) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:354) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:869) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:942) Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 2 missing files; e.g.: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/000003.sst at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:1517) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:1504) at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:342) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ... 5 more
解决方案:删除该nodemanager所在机器的 /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state 文件夹下的全部信息
rm -rf /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/*
启动之前查看一下8041端口是否被占用,没有信息就是没占用,占用的话如果是nodemanager进程就kill掉,如果是其他进程建议就看一下是谁占用的,看能不能关掉或者是为nodemanager换一个端口。搜索配置yarn.nodemanager.address更改默认端口
lsof -i:8041
可能抛出的错误信息:
java.net.BindException: Address already in use;
INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8041] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
解决方案:参考本文1.1,查看yarn.nodemanager.address的端口是否被占用
然后在CDH界面重启相应的NodeManager
这里我遇到的错误如下,都是端口被占用的错误,解决方案可参考前文
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [master01:8031] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
登录到需要查看的服务所在的机器上
# CDH安装的服务的日志文件大都在这里
[root@slave01 ~]# cd /var/log/
[root@slave01 log]# ll
total 3160
......
drwxrwxr-x 3 hdfs hadoop 4096 May 5 14:29 hadoop-hdfs
drwxrwxr-x 3 yarn hadoop 4096 May 5 14:31 hadoop-yarn
......
# 前文的Nodemanager属于yarn范畴,所以这里可以进入hadoop-yarn
[root@slave01 log]# cd hadoop-yarn/
[root@slave01 hadoop-yarn]# ll
total 2868
-rw-r--r-- 1 yarn yarn 2925441 May 5 14:31 hadoop-cmf-yarn-NODEMANAGER-slave01.log.out
-rw-r--r-- 1 yarn yarn 0 May 4 12:59 SecurityAuth-yarn.audit
drwxr-xr-x 2 yarn hadoop 4096 May 4 14:11 stacks
在 /var/log/hadoop-yarn中可以看到名为*NODEMANAGER*的日志文件,查看该日志文件即可看到具体是因为什么原因抛出错误,然后对症下药。如果是查看其他服务日志,都可以通过对应服务的日志文件的名称找到。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。