当前位置:   article > 正文

查看和清除显存

清除显存

深度学习训练过程中如果中断,很容易造成显存占用不释放的问题。做个记录,留着备用。

表现为报错:

tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory

1.查看是否出现了问题:nvidia-smi

  1. +-----------------------------------------------------------------------------+
  2. | NVIDIA-SMI 384.130 Driver Version: 384.130 |
  3. |-------------------------------+----------------------+----------------------+
  4. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
  5. | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
  6. |===============================+======================+======================|
  7. | 0 TITAN V Off | 00000000:01:00.0 On | N/A |
  8. | 39% 53C P2 36W / 250W | 11959MiB / 12055MiB | 0% Default |
  9. +-------------------------------+----------------------+----------------------+
  10. +-----------------------------------------------------------------------------+
  11. | Processes: GPU Memory |
  12. | GPU PID Type Process name Usage |
  13. |=============================================================================|
  14. | 0 1017 G /usr/lib/xorg/Xorg 298MiB |
  15. | 0 1834 G /opt/teamviewer/tv_bin/TeamViewer 6MiB |
  16. | 0 2045 G compiz 177MiB |
  17. | 0 4118 G ...-token=D609226DD6A56AEBB70B08FB7BC10F2E 78MiB |
  18. | 0 4603 G ...uest-channel-token=11061898972785214487 59MiB |
  19. | 0 16481 C python3 418MiB |
  20. | 0 16537 C python3 10916MiB |
  21. +-----------------------------------------------------------------------------+

2.发现16537是罪魁祸首

 kill -9 16537

3.监控GPU:3代表3秒

watch -n 3 nvidia-smi

4.监控cpu和内存

 top -d 1

 free -m 

5.清除cache缓存内存空间

  1. sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'

  2. sudo sh -c 'echo 2 > /proc/sys/vm/drop_caches'

  3. sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/繁依Fanyi0/article/detail/421221
推荐阅读
相关标签
  

闽ICP备14008679号