当前位置:   article > 正文

linux Ubuntu Python 3.10 环境报错与解决方案集合_cuda setup failed despite gpu being available.

cuda setup failed despite gpu being available.

环境配置参考文章:使用Alpaca-Lora基于LLaMA(7B)二十分钟完成微调

1.报错.nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11

解决方法:

pip uninstall nvidia_cublas_cu11
  • 1

2.CUDA版本对应不上


===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda100.so
/home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/xxx/anaconda3/envs/lora did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/conda/lib')}
  warn(msg)
CUDA SETUP: CUDA version lower than 11 are currently not supported for LLM.int8(). You will be only to use 8-bit optimizers and quantization routines!!
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 100
CUDA SETUP: Required library version not found: libbitsandbytes_cuda100.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR=====================================
CUDA SETUP: CUDA detection failed! Possible reasons:
1. CUDA driver not installed
2. CUDA not installed
3. You have multiple conflicting CUDA libraries
4. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=113`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.
CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.
CUDA SETUP: Setup Failed!
CUDA SETUP: CUDA 10.0 not supported. Please use a different CUDA version.
CUDA SETUP: Before you try again running bitsandbytes, make sure old CUDA 10.0 versions are uninstalled and removed from $LD_LIBRARY_PATH variables.
Traceback (most recent call last):
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/runpy.py", line 146, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/research/__init__.py", line 1, in <module>
    from . import nn
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/research/nn/__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/research/nn/modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/optim/__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "/home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62

解决方法:

1.下载对应版本的cuda(11.7为例)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring*.deb
sudo apt-get update
sudo apt-get -y install cuda
  • 1
  • 2
  • 3
  • 4
2.将cuda版本同步
sudo update-alternatives --display cuda
sudo update-alternatives --config cuda
  • 1
  • 2

3.找不到库(以libcusparse.so.11为例)

OSError: libcusparse.so.11: cannot open shared object file: No such file or directory
  • 1

解决方法:

/home/cenghaolong/anaconda3/envs/BIONIC/lib(你建的虚拟环境的 lib 文件夹) 中放入缺少的 libcusparse.so.11 文件。

在这里插入图片描述

4. 某个库缺少值cget_col_row_stats

AttributeError: /home/dell/anaconda3/envs/lora/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
  • 1

解决方法:

把同一目录下的高版本的.so(如libbitsandbytes_cuda117.so)复制一份重命名为libbitsandbytes_cpu.so之后再放回去,覆盖掉原来的文件:

在这里插入图片描述

5.运行时找不到库(libcublas.so.11):

Traceback (most recent call last):
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/torch/__init__.py", line 172, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcublas.so.11: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxx/qsj/alpaca-lora-main/generate.py", line 6, in <module>
    import torch
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/torch/__init__.py", line 217, in <module>
    _load_global_deps()
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/torch/__init__.py", line 178, in _load_global_deps
    _preload_cuda_deps()
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
    ctypes.CDLL(cublas_path)
  File "/home/xxx/anaconda3/envs/lora/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/xxx/anaconda3/envs/lora/lib/python3.10/site-packages/nvidia/cublas/lib/libcublas.so.11: cannot open shared object file: No such file or directory

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
解决方法:
在服务器 find libcudart.so.11.0 存在的地方:
输入查找语句:
cd ~
find -name libcublas.so.11
  • 1
  • 2

在这里插入图片描述

然后把该库以及其软连接全部 copy 到当前环境需要的 位置

在这里插入图片描述

训练7b代码
python finetune.py \
    --base_model 'decapoda-research/llama-7b-hf' \
    --data_path './ml_100k_instruct_data.json' \
    --output_dir './lora-alpaca-ml100k' \
    --batch_size 128 \
    --micro_batch_size 4 \
    --num_epochs 3 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 2000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs \
    --group_by_length
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
训练13b代码
python finetune.py \
    --base_model 'decapoda-research/llama-13b-hf' \
    --data_path './ml_100k_instruct_data.json' \
    --output_dir './lora-alpaca-13bml100k' \
    --batch_size 128 \
    --micro_batch_size 4 \
    --num_epochs 3 \
    --learning_rate 1e-4 \
    --cutoff_len 512 \
    --val_set_size 2000 \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,v_proj]' \
    --train_on_inputs \
    --group_by_length
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

6.使用tensorboard出现ValueError: Duplicate plugins for name projector问题

在terminal运行如下指令打开tensorboard出现错误:
tensorboard --logdir=E:\pythonFiles\files\LLMRank-master\log_tensorboard
  • 1
解决方法:

安装了重复的tensorboard,需要去删除多余的:
在这里插入图片描述
最终保留如下的即可
在这里插入图片描述
之后登录本地网站进行测试:

http://localhost:6006/ 
  • 1

6.

解决方法:降低cffi版本到1.14.0:
pip install cffi==1.14.0
  • 1

7.将linux中的一个文件夹中所有的文件复制粘贴到/home/dell目录中:

cp -r 文件夹名称 /home/dell 
  • 1

8.出现报错:TypeError: dispatch_model() got an unexpected keyword argument 'offload_index'

请尝试更新加速包:

pip install accelerate==0.18.0
  • 1

9.出现报错TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file……

问题描述

TypeError: Descriptors cannot not be created directly. If this call
came from a _pb2.py file, your generated code is out of date and must
be regenerated with protoc >= 3.19.0. If you cannot immediately
regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

需要降低版本:

pip uninstall protobuf
pip install protobuf==3.20.1
  • 1
  • 2
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小蓝xlanll/article/detail/688529
推荐阅读
相关标签
  

闽ICP备14008679号