当前位置:   article > 正文

RedHat7离线安装docker和nvidia-docker_基于red hat系统离线安装nvidia-docker2

基于red hat系统离线安装nvidia-docker2

一、环境

企业生产环境,由于安全性考虑,不能连接外网,给gpu环境搭建带来很大麻烦。因此只能进行离线安装gpu驱动、docker、nvidia-docker等。

环境:RedHat7.5(内核需为3.10+)

[root@localhost ~]# cat /etc/redhat-release
RedHat Linux release 7.5.1804 (Core)
  • 1
  • 2

二、GPU驱动安装

1、查看显卡相关信息

[root@localhost ~]# lshw -numeric -C display
  *-display
       description: VGA compatible controller
       product: ASPEED Graphics Family [1A03:2000]
       vendor: ASPEED Technology, Inc. [1A03]
       physical id: 0
       bus info: pci@0000:03:00.0
       version: 41
       width: 32 bits
       clock: 33MHz
       capabilities: pm msi vga_controller bus_master cap_list rom
       configuration: driver=ast latency=0
       resources: irq:17 memory:98000000-9bffffff memory:9c000000-9c01ffff ioport:2000(size=128)
  *-display
       description: 3D controller
       product: GP104GL [Tesla P4] [10DE:1BB3]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:3b:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:3aff0-3afef iomemory:3aff0-3afef irq:315 memory:b7000000-b7ffffff memory:3affe0000000-3affefffffff memory:3afff0000000-3afff1ffffff
  *-display
       description: 3D controller
       product: GP104GL [Tesla P4] [10DE:1BB3]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:af:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:3eff0-3efef iomemory:3eff0-3efef irq:318 memory:ed000000-edffffff memory:3effe0000000-3effefffffff memory:3efff0000000-3efff1ffffff
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

可以看到本机机器包含两张 Tesla P4 显卡

根据显卡信息去 https://www.nvidia.cn/Download/index.aspx?lang=cn 下载显卡驱动 NVIDIA-Linux-x86_64-415.13.run

2、禁用自带的驱动项目nouveau

# 先查看nouveau驱动是否开启(有内容说明未禁用)
lsmod | grep nouveau
  • 1
  • 2

修改dist-blacklist.conf文件

vim /lib/modprobe.d/dist-blacklist.conf

注释blacklist nvidiafb
#blacklist nvidiafb

添加下面两句:
blacklist nouveau
options nouveau modeset=0

3、重建initramfs image

#备份一份成bak文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
#重启镜像
dracut /boot/initramfs-$(uname -r).img $(uname -r)
#修改运行级别为文本模式
systemctl set-default multi-user.target
# 重启服务器
reboot
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

4、安装kernel-devel、gcc

# 虽然是离线安装,一般完整版系统是具备以下两个包的,如果没有,从https://pkgs.org/下载
yum -y install kernel-devel gcc
  • 1
  • 2

5、执行安装文件

chmod u+x NVIDIA-Linux-x86_64-415.13.run
./NVIDIA-Linux-x86_64-375.39.run --kernel-source-path=/usr/src/kernels/3.10.0-862.el7.x86_64
  • 1
  • 2

二、离线安装docker

离线环境 yum 一般是不带docker库的;使用 rpm 一般会出现缺包的情况,依赖包特别多,离线环境一般是安装不成功的。离线环境最容易成功的是使用二进制包安装方式,二进制包可以从 Index of linux/static/stable/x86_64/ 下载,docker-18.06.3-ce.tgz 以上版本安装后再去安装 nvidia-docker,会出现缺少 docker-ce 情况,因此最好选用 18.03.x 或 18.06.x 版本。

1、完整卸载原有docker

systemctl stop docker 或 service docker stop
yum remove docker docker-client docker-client-latest docker-common docker-latest docker-latest-logrotate \
	docker-logrotate docker-engine docker-ce docker-ce-cli containerd.io
# 使用 yum -y remove 卸载下行命令列出的内容
yum list installed| grep docker
rm -rf /var/lib/docker
rm -rf /var/lib/containerd
rm -rf /etc/docker
rm -rf /etc/systemd/system/docker.service
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

2、安装docker

  • 下载安装文件 docker-18.06.3-ce.tgz

  • 创建安装脚本 install-docker.sh

    #!/bin/sh
    
    usage(){
      echo "Usage: $0 FILE_NAME_DOCKER_CE_TAR_GZ"
      echo "       $0 docker-17.09.0-ce.tgz"
      echo "Get docker-ce binary from: https://download.docker.com/linux/static/stable/x86_64/"
      echo "eg: wget https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz"
      echo ""
    }
    SYSTEMDDIR=/usr/lib/systemd/system
    SERVICEFILE=docker.service
    DOCKERDIR=/usr/bin
    DOCKERBIN=docker
    SERVICENAME=docker
    
    if [ $# -ne 1 ]; then
      usage
      exit 1
    else
      FILETARGZ="$1"
    fi
    
    if [ ! -f ${FILETARGZ} ]; then
      echo "Docker binary tgz files does not exist, please check it"
      echo "Get docker-ce binary from: https://download.docker.com/linux/static/stable/x86_64/"
      echo "eg: wget https://download.docker.com/linux/static/stable/x86_64/docker-17.09.0-ce.tgz"
      exit 1
    fi
    
    echo "##unzip : tar xvpf ${FILETARGZ}"
    tar xvpf ${FILETARGZ}
    echo
    
    echo "##binary : ${DOCKERBIN} copy to ${DOCKERDIR}"
    cp -p ${DOCKERBIN}/* ${DOCKERDIR} >/dev/null 2>&1
    which ${DOCKERBIN}
    
    echo "##systemd service: ${SERVICEFILE}"
    echo "##docker.service: create docker systemd file"
    cat >${SYSTEMDDIR}/${SERVICEFILE} <<EOF
    [Unit]
    Description=Docker Application Container Engine
    Documentation=http://docs.docker.com
    After=network.target docker.socket
    [Service]
    Type=notify
    EnvironmentFile=-/run/flannel/docker
    WorkingDirectory=/usr/local/bin
    ExecStart=/usr/bin/dockerd \
                    -H tcp://0.0.0.0:4243 \
                    -H unix:///var/run/docker.sock \
                    --selinux-enabled=false \
                    --log-opt max-size=1g \
                    --graph=/data/sys_docker	# 设置镜像及容器目录
    ExecReload=/bin/kill -s HUP $MAINPID
    # Having non-zero Limit*s causes performance problems due to accounting overhead
    # in the kernel. We recommend using cgroups to do container-local accounting.
    LimitNOFILE=infinity
    LimitNPROC=infinity
    LimitCORE=infinity
    # Uncomment TasksMax if your systemd version supports it.
    # Only systemd 226 and above support this version.
    #TasksMax=infinity
    TimeoutStartSec=0
    # set delegate yes so that systemd does not reset the cgroups of docker containers
    Delegate=yes
    # kill only the docker process, not all processes in the cgroup
    KillMode=process
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target
    EOF
    
    echo ""
    
    systemctl daemon-reload
    echo "##Service status: ${SERVICENAME}"
    systemctl status ${SERVICENAME}
    echo "##Service restart: ${SERVICENAME}"
    systemctl restart ${SERVICENAME}
    echo "##Service status: ${SERVICENAME}"
    systemctl status ${SERVICENAME}
    
    echo "##Service enabled: ${SERVICENAME}"
    systemctl enable ${SERVICENAME}
    
    echo "## docker version"
    docker version
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
  • 执行安装

    chmod +x install-docker.sh
    ./install-docker.sh ./docker-18.06.3-ce.tgz
    
    • 1
    • 2

三、离线安装 nvidia-docker2

1、安装 nvidia-docker

为了方便使用,nvidia-docker 的安装包与依赖包已经上传,安装前先进行下载 nvidia-docker2

# 解压后,使用rpm安装
rpm -Uvh *.rpm --nodeps --force
  • 1
  • 2

2、修改配置文件

因为 nvidia-docker 随docker一起启动,因此需要修改或新增docker启动配置 /etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

3、重启docker

systemctl restart docker 或 service docker restart
  • 1

参考文章

1、Centos7.6离线安装显卡驱动

2、Install Docker Engine on CentOS

3、Index of linux/static/stable/x86_64/

4、解决 RedHat 7.3 环境下离线安装 docker 最新版本的终极办法

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/576227
推荐阅读
相关标签
  

闽ICP备14008679号