赞
踩
# To remove CUDA Toolkit sudo apt-get purge -y "*cuda*" sudo apt-get purge -y "*cublas*" sudo apt-get purge -y "*cufft*" sudo apt-get purge -y "*cufile*" sudo apt-get purge -y "*curand*" sudo apt-get purge -y "*cusolver*" sudo apt-get purge -y "*cusparse*" sudo apt-get purge -y "*npp*" sudo apt-get purge -y "*nvjpeg*" sudo apt-get purge -y "*nvvm*" sudo apt-get purge -y "nsight*" # To remove TensorRT and its attachments sudo apt-get purge -y "libnvinfer*" sudo apt-get purge -y "nv-tensorrt-local-repo*" sudo apt-get purge -y "libnvonnxparsers*" sudo apt-get purge -y "libnvparsers*" sudo apt-get purge -y graphsurgeon-tf sudo apt-get purge -y onnx-graphsurgeon # To remove cuDNN sudo apt-get purge -y "libcudnn*" sudo apt-get purge -y "cudnn-local-repo*" # To remove extra components sudo apt-get purge -y "*libcufile*" sudo apt-get purge -y "*gds-tools*" sudo apt-get purge -y "*nvidia-fs*" sudo apt-get purge -y "libnccl*" # To remove Nvidia driver and other components sudo apt-get purge -y "*nvidia*" sudo apt-get purge -y "libxnvctrl*"
sudo rm /usr/share/keyrings/cuda*.gpg
sudo rm /usr/share/keyrings/cudnn*.gpg
sudo rm /usr/share/keyrings/nv-tensorrt*.gpg
sudo rm /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt autoremove
重启电脑
cd ~/Downloads/
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
最高兼容到 CUDA 12.4
sudo apt-get install -y nvidia-driver-550-server
或安装兼容 CUDA 12.5 的非 Server 最新版驱动:sudo apt install -y nvidia-driver-555
1、通过以下方式启用对GeForce和Quadro SKU的实验性支持 (Nvidia driver 545.29.02之后不再需要设置)
echo "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" | sudo tee /etc/modprobe.d/nvidia-gsp.conf
2、(替代) 安装open版驱动(CUDA12.2后必须open版驱动才能使用NVIDIA GPUDirect Storage)
sudo apt-get install -y nvidia-driver-550-server-open
sudo apt-get -y install cuda-toolkit-12-3
echo -e '\nexport LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}\nexport PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' |tee -a ~/.bashrc
source ~/.bashrc
【注意】 Ubuntu 下的 Docker Desktop 暂不支持 GPU,故不能安装 Docker Desktop(如何卸载)。
安装 NVIDIA CONTAINER TOOLKIT(官网指南 Installing with Apt),步骤如下:
1、(仅首次)添加 nvidia-container-toolkit 的 apt 源。若以前添加过源,则可以跳过这一步。
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
2、 安装 nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
3、运行
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
提示:此操作可能对snap安装的docker无效,需卸载snap安装的docker sudo snap remove --purge docker
后用apt安装docker解决。
官网指南链接: docs.nvidia.com/deeplearning/cudnn/install-guide.
TensorFlow cuDNN CUDA版本对应表
sudo apt-get install zlib1g
下载cudnn的deb包并安装 (仅其他方式安装CUDA时需要,网络deb安装CUDA应跳过该步骤)
1、cuDNN官网下载地址
2、启用本地存储库
sudo dpkg -i cudnn-local-repo-$distro-8.x.x.x_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-*/cudnn-local-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
安装cuDNN v8
# 运行时(必选)
sudo apt-get install -y libcudnn8
# 开发环境(可选)
sudo apt-get install -y libcudnn8-dev
sudo apt-get install -y libcudnn8-samples
sudo apt-get -y install cudnn9-cuda-12 # 运行时和开发环境
sudo apt-get -y install libcudnn9-samples
一般而言,深度学习库会自动选择所需的cuDNN版本;如需要手动指定系统的默认cuDNN版本,则运行sudo update-alternatives --config libcudnn
。
cp -r /usr/src/cudnn_samples_v8/ /tmp
cd /tmp/cudnn_samples_v8/mnistCUDNN
make clean && make
./mnistCUDNN
cd - >/dev/null
若显示Test passed! ,则安装成功。
若CUDA安装是使用网络 deb(network)安装方式,则:
tensorrt
等库,而要指定v8版本: # 必选项(安装这俩TensorFlow2.15/2.16至少能调用tensorrt了,注意调用cuda11.8的TensorFlow可能不能正常调用!)
sudo apt-get install -y libnvinfer8 libnvinfer-plugin8
# 可选项
# sudo apt-get install -y libnvinfer-lean8 libnvinfer-vc-plugin8 libnvinfer-dispatch8
# sudo apt-get install -y libnvparsers8 libnvonnxparsers8
【提示】 onnx-graphsurgeon 推荐pip安装,而不是apt包,python的 onnx-graphsurgeon 库安装详见 PyPi onnx-graphsurgeon。
【提示】 由于 apt 安装的 tensorrt 不兼容多版本同时安装,tensorrt 的其它安装方式如pip安装等详见 tensorrt官方安装指南。
# 将安装最新的tensorrt v10.0sudo apt-get install tensorrt
sudo apt-get install python3-libnvinfer-dev
若使用的其他CUDA安装方式,则需下载tensorRT 8.x (注意选择deb安装包与cuda大版本相对应,如for Ubuntu 22.04 and CUDA 12.0 and 12.1 DEB local repo Package),并参照指南3.2.1节进行安装
在安装有 TensorFlow2.x 且支持tensorRT的Python下运行如下代码:
若输出正常则tensorRT功能正常;
若中途显示ERROR:tensorflow:Tensorflow needs to be built with TensorRT support enabled to allow TF-TRT to operate.
,则tensorrt未正常安装。
import tensorflow as tf from tensorflow.python.compiler.tensorrt import trt_convert as trt import time from tensorflow import keras from tensorflow.keras.datasets import mnist from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout from tensorflow.keras.optimizers import RMSprop import shutil batch_size = 128 num_classes = 10 epochs = 5 (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.reshape(60000, 784) x_test = x_test.reshape(10000, 784) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 x_test /= 255 print(x_train.shape[0], 'train samples') print(x_test.shape[0], 'test samples') # convert class vectors to binary class matrices y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) model = Sequential() model.add(Dense(512, activation='relu', input_shape=(784,))) model.add(Dropout(0.2)) model.add(Dense(512, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(num_classes, activation='softmax')) model.summary() model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy']) history = model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_test, y_test)) score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score[0]) print('Test accuracy:', score[1]) model.export('/tmp/tf_savedmodel') params=trt.DEFAULT_TRT_CONVERSION_PARAMS params._replace(precision_mode=trt.TrtPrecisionMode.FP32) converter = trt.TrtGraphConverterV2(input_saved_model_dir='/tmp/tf_savedmodel',conversion_params=params) converter.convert()#完成转换,但是此时没有进行优化,优化在执行推理时完成 converter.save('/tmp/trt_savedmodel') (x_train, y_train), (x_test, y_test) = mnist.load_data() x_test = x_test.astype('float32') x_test = x_test.reshape(10000, 784) x_test /= 255 saved_model_loaded = tf.saved_model.load( "/tmp/trt_savedmodel", tags=[trt.tag_constants.SERVING])#读取模型 graph_func = saved_model_loaded.signatures[ trt.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]#获取推理函数,也可以使用saved_model_loaded.signatures['serving_default'] frozen_func = trt.convert_to_constants.convert_variables_to_constants_v2( graph_func)#将模型中的变量变成常量,这一步可以省略,直接调用graph_func也行 shutil.rmtree('/tmp/tf_savedmodel') shutil.rmtree('/tmp/trt_savedmodel') t=time.time() output = frozen_func(tf.constant(x_test))[0].numpy() print(time.time()-t) print((output.argmax(-1)==y_test).mean())
Intel: Build TensorFlow from Source with Intel oneAPI oneDNN library
Tensorflow官方(要看英文版): Build from source
(CUDA12.2后必须open版驱动才能使用NVIDIA GPUDirect Storage)
1、Github: NVIDIA/DALI/releases
2、DALI库数据载入说明: numpy_reader方法 --nvidia DALI官方文档
1、遵循指南2.1禁用iommu并且重启
2、对于nvme固态,运行
cat /sys/block/<nvme>/integrity/device_is_integrity_capable
若输出为0,则支持gds,遵循指南14.1安装MLNX_OFED,在MLNX_OFED官网下载tgz文件并解压编译安装(安装脚本:mlnxofedinstall,卸载脚本:uninstall.sh)
3、运行如下命令
sudo apt-get install nvidia-gds
重启电脑
3、测试gds功能是否正常安装
【方法一】
/usr/local/cuda/gds/tools/gdscheck.py -p
若nvme为supported则可以了
【方法二】 用ipynb运行“DALI库数据载入说明: numpy_reader方法 --nvidia DALI官方文档”中的最后一段代码,若成功返回图片则成功。
1、W1Fl.在tensorflow2.0上使用tensorrt6加速[EB/OL].csdn,2019-12-31. https://blog.csdn.net/weixin_43842032/article/details/103764010
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。