赞
踩
由于现在深度学习各个开源库对cuda/cudnn/pytorch等环境要求各不相同,所以很多时候采用docker方式部署不同的深度学习框架,本文主要记录ubuntu20.04下驱动安装、docker/nvidia-docker安装,修改docker容器等data的存储路径、ubuntu下的用户加入到docker组里等
切换root用户,关闭图像界面, 禁用第三方显卡驱动
# 切换root用户
sudo su
# 关闭图像界面
systemctl set-default multi-user.target
systemctl get-default
# 禁用第三方显卡驱动
vim /etc/modprobe.d/blacklist.conf最后添加
blacklist nouveau
options nouveau modeset=0
#更新内核
update-initramfs -u
# 系统重启
reboot
重新进入系统后
sudo su
# 安装驱动
chmod +x ./NVIDIA-Linux-x86_64-460.106.00.run
./NVIDIA-Linux-x86_64-460.106.00.run --no-opengl-files
# NO->一路回车
# 校验是否安装成功
nvidia-smi
# 将驱动模式设置为常住内存:
nvidia-smi -pm 1
chmod +x cuda-10.0.run
./cuda-10.0.run --no-opengl-libs
# 依次执accept -> no -> yes -> 默认 -> yes -> no
从这里下载安装包,存放到docker-ubuntu-20.04.4/docker目录下,也可以直接从我上传的资源下载
# download deb from https://download.docker.com/linux/ubuntu/dists/focal/pool/stable/amd64/ cd docker-ubuntu-20.04.4/docker sudo dpkg -i ./containerd.io_1.6.26-1_amd64.deb docker-ce_24.0.7-1~ubuntu.20.04~focal_amd64.deb docker-ce-cli_24.0.7-1~ubuntu.20.04~focal_amd64.deb ./docker-buildx-plugin_0.11.2-1~ubuntu.20.04~focal_amd64.deb docker-compose-plugin_2.21.0-1~ubuntu.20.04~focal_amd64.deb sudo systemctl stop docker sudo systemctl stop docker.socket # 修改dockerdata的存放路径以及保留日志大小 sudo vi /etc/docker/daemon.json {"registry-mirrors":["https://mirror.ccs.tencentyun.com", "https://registry.docker-cn.com"], "data-root": "/home/your/data/docker_data", "log-opts":{"max-file":"10","max-size":"200m"} } # 使生效 sudo systemctl daemon-reload sudo systemctl restart docker # 确认docker是否安装成功 sudo docker ps
安装包上文提到的资源里都有
sudo systemctl stop docker sudo systemctl stop docker.socket cd docker-ubuntu-20.04.4/nvidia-docker sudo dpkg -i *.deb # 修改daemon.json sudo vi /etc/docker/daemon.json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } }, "default-runtime": "nvidia", "registry-mirrors":["https://mirror.ccs.tencentyun.com", "https://registry.docker-cn.com"], "data-root": "/home/your/data/docker_data", "log-opts":{"max-file":"10","max-size":"200m"} } # 使生效 sudo systemctl daemon-reload sudo systemctl restart docker # 将docker-compose拷贝到/usr/bin下面 cd docker-ubuntu-20.04.4/ sudo chmod +x docker-compose sudo cp docker-compose /usr/bin/
ubuntu下的用户加入到docker组里- nvidia-docker
# 验证是否有Docker用户组
grep docker /etc/group
# sudo groupadd docker
# 将当前登录用户添加到Docker用户组,比如当前用户名为dl,这添加语句如下
sudo gpasswd -a dl docker
# 更新Docker用户组
newgrp docker
# 搞一个镜像测试一下
docker pull nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
nvidia-docker run -it --name=nvidia-docker-test --net=host --gpus all --shm-size=16g nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04 bash
# 进入容器后,nvidia-smi看是否能用
nvidia-smi
# 开启自动重启
docker update --restart=always 容器名字或者容器ID
# 关闭自动重启
docker update --restart=no 容器名字或者容器ID
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。