赞
踩
好几次遇到问为什么安装的 tensorflow 不能调用GPU,之前搞定过几次,前两天又有人问,又捣鼓了很久才搞定,这里简单记录一下我遇到的问题,以及解决方案。
安装 conda 很重要,使用 pip 安装 tensorflow-gpu 太多问题了(这里默认已经安装了conda)。
conda update -n base -c defaults conda --repodata-fn=repodata.json
之前根据百度,都是执行:
conda update -n base -c defaults conda
然后,首先该命令无法更新到最新的 conda;其次,我们在使用 conda -V 查看版本时,conda 版本显示错误。
该解决方案来自于 GitHub:I got update warning message but unable to update · Issue #12519 · conda/conda · GitHubhttps://github.com/conda/conda/issues/12519
将 conda 的 base 更新到最新,我觉得原因是能够同步最新的包依赖关系,过时的版本可能导致依赖出问题。
conda create -n TensorFlow2.4 python=3.9
当然,这里可以根据自己的 CUDA 版本选择对应的 tensorflow 版本,我的 CUDA 版本为 11.3 :
- (Tensorflow2.4) name@eclab:~$ nvcc -V
- nvcc: NVIDIA (R) Cuda compiler driver
- Copyright (c) 2005-2021 NVIDIA Corporation
- Built on Sun_Mar_21_19:15:46_PDT_2021
- Cuda compilation tools, release 11.3, V11.3.58
- Build cuda_11.3.r11.3/compiler.29745058_0
不显示的话,可以自行进一步搜索为什么 nvcc -V 不显示:
Ubuntu20.04LTS系统CUDA已经安装但nvcc -V显示command not found_nvcc -v 提示未找到命令_AISecurity盐究员的博客-CSDN博客安装了NVIDIA驱动程序,同时也安装了CUDA,但使用nvcc -V使用nvcc -V命令可以查看CUDA的版本,如下所示为正常的输入、输出内容,可以看出通过nvcc -V命令,可以看到目前所使用的CUDA版本。_nvcc -v 提示未找到命令https://blog.csdn.net/m0_38068876/article/details/127836484 注 .bashrc 文件添加环境变量时,需要根据 /usr/local/ 下的 cuda实际情况进行修改,这里展示我的情况:
- (Tensorflow2.4) name@eclab:~$ cd /usr/local/
- (Tensorflow2.4) name@eclab:/usr/local$ ls
- bin cuda-11.3 games lib sbin src
- cuda etc include man share sunlogin
这里有 cuda 软链接,链接到 cuda-11.3,所以建议使用下面命令:
- # cuda-11.3
- export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
- export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
激活环境:
conda activate TensorFlow2.4
安装 tensorflow-gpu:
conda install tensorflow-gpu
注:不要使用pip安装!不要使用pip安装!不要使用pip安装!
这里没有选择 tensorflow-gpu 版本,conda 自动下载了 tensorflow-gpu==2.4.1 (版本对应可以查看 Build from source | TensorFlow)。
执行如下两个命令即可:
- (Tensorflow2.4) name@eclab:/usr/local$ python
- Python 3.9.17 (main, Jul 5 2023, 20:41:20)
- [GCC 11.2.0] :: Anaconda, Inc. on linux
- Type "help", "copyright", "credits" or "license" for more information.
- >>> import tensorflow as tf
- 2023-07-10 10:22:16.571135: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
- >>> tf.config.list_physical_devices('GPU')
- 2023-07-10 10:22:27.565493: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
- 2023-07-10 10:22:27.567453: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
- 2023-07-10 10:22:27.611185: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
- pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:22:27.612680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:
- pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:22:27.613857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties:
- pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:22:27.614783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties:
- pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:22:27.614821: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
- 2023-07-10 10:22:27.617316: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
- 2023-07-10 10:22:27.617370: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
- 2023-07-10 10:22:27.619509: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
- 2023-07-10 10:22:27.619882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
- 2023-07-10 10:22:27.622449: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
- 2023-07-10 10:22:27.623913: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
- 2023-07-10 10:22:27.629319: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
- 2023-07-10 10:22:27.644606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0, 1, 2, 3
- [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]
下面演示使用 pip 安装的话存在的问题。
- (base) name@eclab:~$ conda create -n Tensorflow-err python=3.9
-
- (Tensorflow-err) name@eclab:~$ pip install tensorflow-gpu==2.4.1
- ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==2.4.1 (from versions: 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.6.4, 2.6.5, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.7.2, 2.7.3, 2.7.4, 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.8.4, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.9.3, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.10.1, 2.11.0rc0, 2.11.0rc1, 2.11.0rc2, 2.11.0, 2.12.0)
- ERROR: No matching distribution found for tensorflow-gpu==2.4.1
首先安装不了2.4.1,根据提示,选择安装2.5.0;
pip install tensorflow-gpu==2.5
使用步骤(三)中的 2.测试方法:
- (Tensorflow-err) name@eclab:~$ python
- Python 3.9.17 (main, Jul 5 2023, 20:41:20)
- [GCC 11.2.0] :: Anaconda, Inc. on linux
- Type "help", "copyright", "credits" or "license" for more information.
- >>> import tensorflow as tf
- 2023-07-10 10:38:37.238756: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
- >>> tf.config.list_physical_devices('GPU')
- 2023-07-10 10:38:40.413250: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
- 2023-07-10 10:38:40.456066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
- pciBusID: 0000:02:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:38:40.457549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 1 with properties:
- pciBusID: 0000:03:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:38:40.458707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 2 with properties:
- pciBusID: 0000:82:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:38:40.459651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 3 with properties:
- pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
- coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
- 2023-07-10 10:38:40.459700: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
- 2023-07-10 10:38:40.464266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
- 2023-07-10 10:38:40.464333: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
- 2023-07-10 10:38:40.465775: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
- 2023-07-10 10:38:40.466117: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
- 2023-07-10 10:38:40.467045: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
- 2023-07-10 10:38:40.468303: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
- 2023-07-10 10:38:40.468555: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64
- 2023-07-10 10:38:40.468578: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1766] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
- Skipping registering GPU devices...
- []
错误内容:
tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.3/lib64
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。