当前位置:   article > 正文

docker配置ubuntu16.04+cuda10.1+cudnn7详解_dockers容器中install cuda==10.1 with cudnn7

dockers容器中install cuda==10.1 with cudnn7

1、安装docker

  • 卸载apt官方旧版本的docker:
sudo apt-get remove docker docker-engine docker-ce docker.io
  • 更新apt包:
sudo apt-get update
  •  安装以下包以使apt可以通过HTTPS使用存储库(repository):
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
  • 添加docker官方密钥
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  • 设置stable存储库
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
  • 再次更新apt包:
sudo apt-get update
  • 安装最新docker CE:
sudo apt-get install -y docker-ce
  • 查看docker服务是否启动:
systemctl status docker
  • 若未启动,则启动docker
sudo systemctl start docker

2、安装nvidia-docker

若想在docker目前只支持运行cpu程序,若想调用主机gpu则需要安装nvidia官方提供的nvidia-docker。

官方地址:https://github.com/NVIDIA/nvidia-docker

若docker版本>19.03 则不需要安装nvidia-docker,只需要安装nvidia-container-tookit。

  1. distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
  2. curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
  3. curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
  4. sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
  5. sudo systemctl restart docker

测试安装是否成功,此处会从docker官方仓库下载镜像。

  1. #### Test nvidia-smi with the latest official CUDA image
  2. docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
  3. # Start a GPU enabled container on two GPUs
  4. docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
  5. # Starting a GPU enabled container on specific GPUs
  6. docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
  7. docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi
  8. # Specifying a capability (graphics, compute, ...) for my container
  9. # Note this is rarely if ever used this way
  10. docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi

若输出gpu信息则成功。

  1. Tue Apr 24 18:58:50 2018
  2. +-----------------------------------------------------------------------------+
  3. | NVIDIA-SMI 390.25 Driver Version: 390.25 |
  4. |-------------------------------+----------------------+----------------------+
  5. | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
  6. | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
  7. |===============================+======================+======================|
  8. | 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |
  9. | 0% 53C P5 27W / 280W | 0MiB / 11177MiB | 3% Default |
  10. +-------------------------------+----------------------+----------------------+
  11. +-----------------------------------------------------------------------------+
  12. | Processes: GPU Memory |
  13. | GPU PID Type Process name Usage |
  14. |=============================================================================|
  15. | No running processes found |
  16. +-----------------------------------------------------------------------------+

 

官方下载镜像很慢(翻墙大佬请略过以下部分),需配置国内镜像仓库。

sudo vim /etc/docker/daemon.json

打开如下图。

  1. {
  2. "runtimes":{
  3. "nvidia":{
  4. "path":"nvidia-container-runtime","
  5. runtimeArgs":[]
  6. }
  7. }
  8. }

修改为:(文内为阿里云仓库,亲测可用,还有 https://registry.docker-cn.comhttp://hub-mirror.c.163.com 等等仓库)

  1. {
  2. "registry-mirrors":["https://3laho3y3.mirror.aliyuncs.com"],
  3. "runtimes":{
  4. "nvidia":{
  5. "path":"nvidia-container-runtime","
  6. runtimeArgs":[]
  7. }
  8. }
  9. }

3、下载cuda/nvidia-ubuntu镜像

docker镜像官网:https://hub.docker.com/

进入官网搜索nvidia/cuda

 

选择tags,找到10.1-cudnn7-devel-ubuntu16.04(包含ubuntu系统库,cuda10.1,cudnn7),若不想包含系统库可以选用其它镜像。

下载镜像。

  1. sudo docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04

等待下载完成,运行docker images,查看是否存在镜像。

因镜像可能过大需要调整本地docker 镜像存储库大小,在docker.service中配置.

一般来说,docker.service 在/usr/lib/systemed/system/目录下,但是我测试时,却在/lib/systemed/system/目录下,注意防雷。

打开docker.service.

  1. # cat /usr/lib/systemd/system/docker.service[Unit]
  2. Description=Docker Application Container Engine
  3. Documentation=http://docs.docker.com
  4. After=network.target
  5. Wants=docker-storage-setup.service
  6. Requires=docker-cleanup.timer
  7. [Service]
  8. Type=notify
  9. NotifyAccess=all
  10. EnvironmentFile=-/run/containers/registries.conf
  11. EnvironmentFile=-/etc/sysconfig/docker
  12. EnvironmentFile=-/etc/sysconfig/docker-storage
  13. EnvironmentFile=-/etc/sysconfig/docker-network
  14. Environment=GOTRACEBACK=crash
  15. Environment=DOCKER_HTTP_HOST_COMPAT=1
  16. Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
  17. ExecStart=/usr/bin/dockerd-current \
  18. --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
  19. --default-runtime=docker-runc \
  20. --exec-opt native.cgroupdriver=systemd \
  21. --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
  22. $OPTIONS \
  23. $DOCKER_STORAGE_OPTIONS \
  24. $DOCKER_NETWORK_OPTIONS \
  25. $ADD_REGISTRY \
  26. $BLOCK_REGISTRY \
  27. $INSECURE_REGISTRY\
  28. $REGISTRIES
  29. ExecReload=/bin/kill -s HUP $MAINPID
  30. LimitNOFILE=1048576
  31. LimitNPROC=1048576
  32. LimitCORE=infinity
  33. TimeoutStartSec=0
  34. Restart=on-abnormal
  35. MountFlags=slave
  36. KillMode=process
  37. [Install]
  38. WantedBy=multi-user.target

更改容器大小

  1. [Service]
  2. ...
  3. ExecStart=/usr/bin/dockerd
  4. --storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
  5. ...
  6. DOCKER最大空间为100G,容器最大空间为30G

改完之后需要重新加载文件,重启docker

  1. systemctl daemon-reload
  2. #重启docker
  3. service docker restart

修改docker镜像存储路径

  1. sudo docker info
  2. 输出如下:
  3. Containers: 1
  4. Running: 0
  5. Paused: 0
  6. Stopped: 1
  7. Images: 1
  8. Server Version: 1.13.1
  9. Storage Driver: overlay2
  10. Backing Filesystem: xfs
  11. Supports d_type: true
  12. Native Overlay Diff: true
  13. Logging Driver: journald
  14. Cgroup Driver: systemd
  15. Plugins:
  16. Volume: local
  17. Network: bridge host macvlan null overlay
  18. Swarm: inactive
  19. Runtimes: docker-runc runc
  20. Default Runtime: docker-runc
  21. Init Binary: /usr/libexec/docker/docker-init-current
  22. containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
  23. runc version: df5c38a9167e87f53a9894d77c0950e178a745e7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
  24. init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
  25. Security Options:
  26. seccomp
  27. WARNING: You're not using the default seccomp profile
  28. Profile: /etc/docker/seccomp.json
  29. Kernel Version: 3.10.0-862.14.4.el7.x86_64
  30. Operating System: CentOS Linux 7 (Core)
  31. OSType: linux
  32. Architecture: x86_64
  33. Number of Docker Hooks: 3
  34. CPUs: 1
  35. Total Memory: 991.7 MiB
  36. Name: fuqiang
  37. ID: F2MD:SKQC:HSZG:LN7H:L3KI:7SN2:JHRP:HMQI:3KK2:4RTO:TPTJ:UCYZ
  38. Docker Root Dir: /var/lib/docker
  39. Debug Mode (client): false
  40. Debug Mode (server): false
  41. Registry: https://index.docker.io/v1/
  42. Experimental: false
  43. Insecure Registries:
  44. 127.0.0.0/8
  45. Live Restore Enabled: false
  46. Registries: docker.io (secure)

可以看到Docker Root Dir:/var/lib/docker,就是镜像与容器实例的默认存储位置。往往当镜像很大时,此目录则不够存储,需更换目录。

镜像目标位置:/home/docker

停止docker服务:

systemctl stop docker

数据迁移:

sudo cp -r /var/lib/docker/ /home/docker

 

docker.service  添加--graph

  1. [Service]
  2. ...
  3. ExecStart=/usr/bin/dockerd --graph=your_docker_image_path
  4. --storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
  5. ...

 

启动docker服务:

  1. systemctl start docker
  2. systemctl status docker

则更换成功。

 

4、ubuntu 主机显示docker图形界面

通过网络方式,主机需安装xserver

  1. A.在宿主机
  2. 查看宿主机IP
  3. $ ifconfig ##假设为xxx.xxx.xxx.xx
  4. 查看当前显示的环境变量值
  5. $ echo $DISPLAY (要在显示屏查看,其他ssh终端不行) ##假设为:0
  6. 或通过socket文件分析:
  7. $ ll /tmp/.X11-unix/ ##假设为X0= ---> :0
  8. 安装xserver
  9. $ sudo apt install x11-xserver-utils
  10. $ sudo vim /etc/lightdm/lightdm.conf
  11. 增加许可网络连接
  12. [SeatDefaults]
  13. xserver-allow-tcp=true
  14. 重启xserver
  15. $ sudo systemctl restart lightdm
  16. 许可所有用户都可访问xserver
  17. xhost +
  18. B.在docker 容器内
  19. # export DISPLAY=xxx.xxx.xxx.xx:0

踩坑总结:

1、自定义ubuntu镜像,安装cuda,cudnn成功,但是c++ 调用cudnnapi失败,下载了nvidia/cuda镜像调用成功,原因不明。

2、容器大小不足,需要增加容器大小

3、本地镜像库不足,需要更换镜像库。在更换之前需要copy源目录下所有文件到目标目录。

 

欢迎评论,私信。

 

 

 

 

 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/凡人多烦事01/article/detail/570405
推荐阅读
相关标签
  

闽ICP备14008679号