赞
踩
sudo apt-get remove docker docker-engine docker-ce docker.io
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce
systemctl status docker
sudo systemctl start docker
若想在docker目前只支持运行cpu程序,若想调用主机gpu则需要安装nvidia官方提供的nvidia-docker。
官方地址:https://github.com/NVIDIA/nvidia-docker
若docker版本>19.03 则不需要安装nvidia-docker,只需要安装nvidia-container-tookit。
- distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
- curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
- curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
-
- sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
- sudo systemctl restart docker
测试安装是否成功,此处会从docker官方仓库下载镜像。
- #### Test nvidia-smi with the latest official CUDA image
- docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
-
- # Start a GPU enabled container on two GPUs
- docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi
-
- # Starting a GPU enabled container on specific GPUs
- docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi
- docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi
-
- # Specifying a capability (graphics, compute, ...) for my container
- # Note this is rarely if ever used this way
- docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi
若输出gpu信息则成功。
- Tue Apr 24 18:58:50 2018
- +-----------------------------------------------------------------------------+
- | NVIDIA-SMI 390.25 Driver Version: 390.25 |
- |-------------------------------+----------------------+----------------------+
- | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
- | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
- |===============================+======================+======================|
- | 0 GeForce GTX 108... Off | 00000000:01:00.0 Off | N/A |
- | 0% 53C P5 27W / 280W | 0MiB / 11177MiB | 3% Default |
- +-------------------------------+----------------------+----------------------+
- +-----------------------------------------------------------------------------+
- | Processes: GPU Memory |
- | GPU PID Type Process name Usage |
- |=============================================================================|
- | No running processes found |
- +-----------------------------------------------------------------------------+
官方下载镜像很慢(翻墙大佬请略过以下部分),需配置国内镜像仓库。
sudo vim /etc/docker/daemon.json
打开如下图。
- {
-
- "runtimes":{
- "nvidia":{
- "path":"nvidia-container-runtime","
- runtimeArgs":[]
- }
- }
-
- }
修改为:(文内为阿里云仓库,亲测可用,还有 https://registry.docker-cn.com,http://hub-mirror.c.163.com 等等仓库)
- {
- "registry-mirrors":["https://3laho3y3.mirror.aliyuncs.com"],
- "runtimes":{
- "nvidia":{
- "path":"nvidia-container-runtime","
- runtimeArgs":[]
- }
- }
-
- }
docker镜像官网:https://hub.docker.com/
进入官网搜索nvidia/cuda
选择tags,找到10.1-cudnn7-devel-ubuntu16.04(包含ubuntu系统库,cuda10.1,cudnn7),若不想包含系统库可以选用其它镜像。
下载镜像。
- sudo docker pull nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04
-
等待下载完成,运行docker images,查看是否存在镜像。
因镜像可能过大需要调整本地docker 镜像存储库大小,在docker.service中配置.
一般来说,docker.service 在/usr/lib/systemed/system/目录下,但是我测试时,却在/lib/systemed/system/目录下,注意防雷。
打开docker.service.
- # cat /usr/lib/systemd/system/docker.service[Unit]
- Description=Docker Application Container Engine
- Documentation=http://docs.docker.com
- After=network.target
- Wants=docker-storage-setup.service
- Requires=docker-cleanup.timer
-
- [Service]
- Type=notify
- NotifyAccess=all
- EnvironmentFile=-/run/containers/registries.conf
- EnvironmentFile=-/etc/sysconfig/docker
- EnvironmentFile=-/etc/sysconfig/docker-storage
- EnvironmentFile=-/etc/sysconfig/docker-network
- Environment=GOTRACEBACK=crash
- Environment=DOCKER_HTTP_HOST_COMPAT=1
- Environment=PATH=/usr/libexec/docker:/usr/bin:/usr/sbin
- ExecStart=/usr/bin/dockerd-current \
- --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current \
- --default-runtime=docker-runc \
- --exec-opt native.cgroupdriver=systemd \
- --userland-proxy-path=/usr/libexec/docker/docker-proxy-current \
- $OPTIONS \
- $DOCKER_STORAGE_OPTIONS \
- $DOCKER_NETWORK_OPTIONS \
- $ADD_REGISTRY \
- $BLOCK_REGISTRY \
- $INSECURE_REGISTRY\
- $REGISTRIES
- ExecReload=/bin/kill -s HUP $MAINPID
- LimitNOFILE=1048576
- LimitNPROC=1048576
- LimitCORE=infinity
- TimeoutStartSec=0
- Restart=on-abnormal
- MountFlags=slave
- KillMode=process
-
- [Install]
- WantedBy=multi-user.target
更改容器大小
- [Service]
- ...
- ExecStart=/usr/bin/dockerd
- --storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
- ...
-
- DOCKER最大空间为100G,容器最大空间为30G
改完之后需要重新加载文件,重启docker
- systemctl daemon-reload
-
- #重启docker
- service docker restart
修改docker镜像存储路径
- sudo docker info
-
- 输出如下:
-
-
- Containers: 1
- Running: 0
- Paused: 0
- Stopped: 1
- Images: 1
- Server Version: 1.13.1
- Storage Driver: overlay2
- Backing Filesystem: xfs
- Supports d_type: true
- Native Overlay Diff: true
- Logging Driver: journald
- Cgroup Driver: systemd
- Plugins:
- Volume: local
- Network: bridge host macvlan null overlay
- Swarm: inactive
- Runtimes: docker-runc runc
- Default Runtime: docker-runc
- Init Binary: /usr/libexec/docker/docker-init-current
- containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1)
- runc version: df5c38a9167e87f53a9894d77c0950e178a745e7 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f)
- init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
- Security Options:
- seccomp
- WARNING: You're not using the default seccomp profile
- Profile: /etc/docker/seccomp.json
- Kernel Version: 3.10.0-862.14.4.el7.x86_64
- Operating System: CentOS Linux 7 (Core)
- OSType: linux
- Architecture: x86_64
- Number of Docker Hooks: 3
- CPUs: 1
- Total Memory: 991.7 MiB
- Name: fuqiang
- ID: F2MD:SKQC:HSZG:LN7H:L3KI:7SN2:JHRP:HMQI:3KK2:4RTO:TPTJ:UCYZ
- Docker Root Dir: /var/lib/docker
- Debug Mode (client): false
- Debug Mode (server): false
- Registry: https://index.docker.io/v1/
- Experimental: false
- Insecure Registries:
- 127.0.0.0/8
- Live Restore Enabled: false
- Registries: docker.io (secure)
可以看到Docker Root Dir:/var/lib/docker,就是镜像与容器实例的默认存储位置。往往当镜像很大时,此目录则不够存储,需更换目录。
镜像目标位置:/home/docker
停止docker服务:
systemctl stop docker
数据迁移:
sudo cp -r /var/lib/docker/ /home/docker
docker.service 添加--graph
- [Service]
- ...
- ExecStart=/usr/bin/dockerd --graph=your_docker_image_path
- --storage-driver devicemapper --storage-opt dm.loopdatasize=100G --storage-opt dm.loopmetadatasize=10G --storage-opt dm.fs=ext4 --storage-opt dm.basesize=30G
- ...
启动docker服务:
- systemctl start docker
- systemctl status docker
则更换成功。
通过网络方式,主机需安装xserver
- A.在宿主机
- 查看宿主机IP
- $ ifconfig ##假设为xxx.xxx.xxx.xx
- 查看当前显示的环境变量值
- $ echo $DISPLAY (要在显示屏查看,其他ssh终端不行) ##假设为:0
- 或通过socket文件分析:
- $ ll /tmp/.X11-unix/ ##假设为X0= ---> :0
-
- 安装xserver
- $ sudo apt install x11-xserver-utils
- $ sudo vim /etc/lightdm/lightdm.conf
- 增加许可网络连接
- [SeatDefaults]
- xserver-allow-tcp=true
- 重启xserver
- $ sudo systemctl restart lightdm
- 许可所有用户都可访问xserver
- xhost +
-
-
- B.在docker 容器内
- # export DISPLAY=xxx.xxx.xxx.xx:0
踩坑总结:
1、自定义ubuntu镜像,安装cuda,cudnn成功,但是c++ 调用cudnnapi失败,下载了nvidia/cuda镜像调用成功,原因不明。
2、容器大小不足,需要增加容器大小
3、本地镜像库不足,需要更换镜像库。在更换之前需要copy源目录下所有文件到目标目录。
欢迎评论,私信。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。