当前位置:   article > 正文

k8s一直存在terminating任务的那点线索_failed to exit within 30 seconds of signal 15 - us

failed to exit within 30 seconds of signal 15 - using the force

问题来源:

博主所在工作集群中经常遇到k8s的deploy和job中存在terminating任务的现场,顺藤摸瓜发现造成terminating的原因是pod所在节点(ubuntu16.04.6)的容器中有进程未杀掉导致;该进程为D进程,难以处理。

pod所在节点日志有以下特征:

1、大量OOM记录

2、syslog(dmesg亦如此)频繁SLUB(后经网络游历该日志虽为系统bug,非此篇文章描述问题的起源。)

SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)

3、docker的日志则是:

  1. Aug 26 21:12:22 n002 dockerd[1632]: time="2020-08-26T21:12:22.358239959+08:00" level=info msg="Container b39ef98d452cd825cd6ab4e07767b5e8091d055e75e9a7b96ba83ba9c4ac2089 failed to exit within 30 seconds of signal 15 - using the force"
  2. evel=info msg="Container b39ef98d452c failed to exit within 10 seconds of kill - trying direct SIGKILL"

4、dmesg中大量nfs retry日志:

  1. kernel: [1032289.079654] nfs: server 10.32.0.10 not responding, still trying
  2. kernel: [1032289.079664] nfs: server 10.32.0.10 not responding, still trying
  3. kernel: [1032289.151627] nfs: server 10.32.0.10 not responding, still trying

 

2020年12月13日 01:00:50增加信息,先睡了,日后补充,有问题交流。:

https://k8s.imroc.io/avoid/handle-cgroup-oom-in-userspace-with-oom-guard/

https://k8s.imroc.io/troubleshooting/pod/slow-terminating/

https://www.cnblogs.com/jmliao/p/11322804.html

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/花生_TL007/article/detail/684429
推荐阅读
相关标签
  

闽ICP备14008679号