当前位置:   article > 正文

安装CUDA Toolkit解决异常:OSError: CUDA_HOME environment variable is not set.

安装CUDA Toolkit解决异常:OSError: CUDA_HOME environment variable is not set.

安装CUDA Toolkit

异常信息

在执行pip install flash_attn,安装一个推理加速库的时候,遇到如下异常:

Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting flash_attn
  Downloading https://mirrors.aliyun.com/pypi/packages/72/94/06f618bb338ec7203b48ac542e73087362b7750f9c568b13d213a3f181bb/flash_attn-2.5.8.tar.gz (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 1.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      /tmp/pip-install-fg7pt8f4/flash-attn_1e4c76d3ba9f4a5d968930613e3c4bd7/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-fg7pt8f4/flash-attn_1e4c76d3ba9f4a5d968930613e3c4bd7/setup.py", line 134, in <module>
          CUDAExtension(
        File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1077, in CUDAExtension
          library_dirs += library_paths(cuda=True)
        File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1204, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/usr/local/program/miniconda3/envs/llama3/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2419, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
      
      
      torch.__version__  = 2.3.0+cu121
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40

分析

首先操作系统已经安装了驱动,并且驱动自带CUDA,可通过nvidia-smi命令查看
在这里插入图片描述
注意:

当时看到这里是有疑惑的,GPU显卡上已经有了CUDA,为何还提示需要CUDA?

原因如下:

首先CUDA有两个主要的API,runtime API和driver API。显然GPU显卡中的CUDA对应driver API,那么此时出现这个异常提示需要CUDA信息,很显然这个CUDA需要的就是runtime API,因此为了支持runtime API,就需要额外再安装CUDA Toolkit

解决异常:

CUDA Toolkit的安装路径通常在usr/local/路径下,经检查发现该路径下确实不存在CUDA Toolkit的安装目录

既然没有安装CUDA Toolkit,那么直接安装CUDA Toolkit来尝试解决这个问题。

下载CUDA

CUDA Toolkit是CUDA的工具包,安装CUDA其实就是安装CUDA Toolkit。

访问https://developer.nvidia.com/cuda-toolkit-archive,选择需要的CUDA版本

在这里插入图片描述
为了兼容性,执行nvidia-smi命令,查看GPU的驱动与CUDA版本
在这里插入图片描述
由于GPU自身CUDA版本是12.2,因此这里选择下载CUDA Toolkit 12.2

这里选择:Linux系统、x86_64架构、Ubuntu系统、系统版本22.04、runfile(local)安装方式

在这里插入图片描述
同时页面下方也给出了安装说明
在这里插入图片描述

执行安装

wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run

sudo sh cuda_12.2.0_535.54.03_linux.run
  • 1
  • 2
  • 3

选择Continue后回车
在这里插入图片描述
输入accept接受
在这里插入图片描述
因为安装了Drive驱动,所以取消安装,默认勾选(x),取消后选择Install进行安装。
在这里插入图片描述
出现如下日志,表示安装成功

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.2/

Please make sure that
 -   PATH includes /usr/local/cuda-12.2/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.2/lib64, or, add /usr/local/cuda-12.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.2/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 535.00 is required for CUDA 12.2 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17

配置环境变量

编辑vim ~/.bashrc文件,配置环境变量,参考官方文档: Environment Setup

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDA_HOME=$CUDA_HOME:/usr/local/cuda
  • 1
  • 2
  • 3

验证

执行nvcc -V命令,查看cuda是否安装成功

CUDA NVCC就是CUDA的编译器,可以从CUDA Toolkit的/bin目录中获取,类似于gcc就是c语言的编译器

root@master:~# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/613904
推荐阅读
相关标签
  

闽ICP备14008679号