00、CUDA简介
CUDA和GPU的并行处理能力来加速深度学习和其他计算密集型应用程序
01、CPU+GPU协同架构
02、部署环境
- [docker@lab-250 ~]$ cat /etc/*release
- NAME="Red Hat Enterprise Linux Server"
- VERSION="7.0 (Maipo)"
- ID="rhel"
- ID_LIKE="fedora"
- VERSION_ID="7.0"
- PRETTY_NAME="Red Hat Enterprise Linux Server 7.0 (Maipo)"
- ANSI_COLOR="0;31"
- CPE_NAME="cpe:/o:redhat:enterprise_linux:7.0:GA:server"
- HOME_URL="https://www.redhat.com/"
- BUG_REPORT_URL="https://bugzilla.redhat.com/"
- REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
- REDHAT_BUGZILLA_PRODUCT_VERSION=7.0
- REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
- REDHAT_SUPPORT_PRODUCT_VERSION=7.0
- Red Hat Enterprise Linux Server release 7.0 (Maipo)
- Red Hat Enterprise Linux Server release 7.0 (Maipo)
[docker@lab-250 ~]$ uname -r
3.10.0-123.el7.x86_64
[docker@lab-250 ~]$ uname -a
Linux lab-250 3.10.0-123.el7.x86_64 #1 SMP Mon May 5 11:16:57 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
注意:要在服务器上安装GPU显卡
03、下载CUDA-Tookit
https://developer.nvidia.com/cuda-toolkit-archive
CUDA Toolkit 9.0 (Sept 2017), Online Documentation //实验下载此版本,根据系统下载对应的安装包,建议选择本地集成成果包!
https://developer.nvidia.com/cuda-toolkit
注意:下面的安装,是由于系统是rhel7.0,错误认为是centos7.0导致部分rpm未安装需要单独下载。一般对应版本是不需要在额外下载rpm包
cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64-rpm #centos7,由于centos是基于rhel7的开源发行版本,所以名字rhel7
04、setup
- Installation Instructions:
- rpm -i cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm
- yum clean all && yum makecache
- yum install cuda
- Other installation options are available in the form of meta-packages. For example, to install all the library packages, replace "cuda" with the "cuda-libraries-9-0" meta package
注意:安装cuda的时候它会自动找NVIDIA显卡的,不需要提前把NVIDIA显卡设置为默认显卡
错误处理:
https://mirrors.aliyun.com/epel/7/aarch64/Packages/d/dkms-2.6.1-1.el7.noarch.rpm
https://mirrors.aliyun.com/centos/7.6.1810/os/x86_64/Packages/libvdpau-1.1.1-3.el7.x86_64.rpm
- --> Finished Dependency Resolution
- Error: Package: 1:nvidia-kmod-384.81-2.el7.x86_64 (cuda-9-0-local)
- Requires: dkms
- You could try using --skip-broken to work around the problem
- You could try running: rpm -Va --nofiles --nodigest
- [root@lab-250 ~]# rz -E
- rz waiting to receive.
- [root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm
- warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
- error: Failed dependencies:
- elfutils-libelf-devel is needed by dkms-2.6.1-1.el7.noarch
- [root@lab-250 ~]#
- [root@lab-250 ~]# yum install -y elfutils-libelf-devel
- Resolving Dependencies
- --> Running transaction check
- ---> Package elfutils-libelf-devel.x86_64 0:0.158-3.el7 will be installed
- --> Finished Dependency Resolution
-
- Dependencies Resolved
[root@lab-250 ~]# rpm -ivh dkms-2.6.1-1.el7.noarch.rpm
warning: dkms-2.6.1-1.el7.noarch.rpm: Header V3 RSA/SHA256 Signature, key ID 352c64e5: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:dkms-2.6.1-1.el7 ################################# [100%]
[root@lab-250 ~]#
[root@lab-250 ~]# yum install -y cuda
05、设置环境变量
/usr/local/cuda-9.0 #默认安装位置
vim /etc/profile
export CUDA_HOME="/usr/local/cuda-9.0"
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
source /etc/profile
- [docker@lab-250 ~]$ nvcc -V #验证环境变量
- nvcc: NVIDIA (R) Cuda compiler driver
- Copyright (c) 2005-2017 NVIDIA Corporation
- Built on Fri_Sep__1_21:08:03_CDT_2017
- Cuda compilation tools, release 9.0, V9.0.176
- [docker@lab-250 ~]$ nvidia-smi #查看本机GPU显卡信息,由于测试机未安装GPU显卡导致的
- NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installe
- d and running.
引用:
https://baijiahao.baidu.com/s?id=1610852365402771191&wfr=spider&for=pc
https://www.jianshu.com/p/34a504af8d51