当前位置:   article > 正文

win11下部署stable diffusion docker版遇到的问题和解决方案_runtimeerror: found no nvidia driver on your syste

runtimeerror: found no nvidia driver on your system. please check that you h

背景

为了在本地愉快流畅地体验stable diffsion,且不希望直接在windows中安装过多复杂的环境,顺便体验容器的部署和发布的便利,决定选择stable diffusion的docker版(AbdBarho版)。

网上已经有很多stable diffusion的部署文章,有很多甚至是零基础或者一键安装的,但自己尝试之后才发现还是有很多坑,真正操作起来并没有那么容易。这些坑的来源主要是两个方面:一个是网络下载问题,另一个是docker中的GPU运行问题。

写这篇文章的主要目的并非详细介绍环境构建的步骤,而是记录部署过程中踩过的坑,以及相关的解决方案。如果你已经尝试了docker版部署并遇到了问题,可以做个参考。

系统需求

网上很多教程都是在Linux上部署的,但是有一点要明确:虚拟机上的Linux是不行的
因为如果要在虚拟机上使用GPU,就意味着GPU必须虚拟化或者透传给虚拟机使用,就像CPU和内存的虚拟化一样,但GPU并不支持虚拟化(也许有,但你总不会为了部署个软件去改bios吧),所以即使你在虚拟机中安装了GPU驱动也无法使用GPU。

其次,由于docker只能在linux上运行,就意味着你必须使用wsl,在wsl里再安装docker。
(也可以安装docker-desktop,但似乎坑比较多,不如直接在wsl中直接安装docker-ce)。

需要安装的软件

windows上

Nvidia显卡驱动

https://www.nvidia.com/en-us/geforce/drivers/

wsl2

wsl --install
  • 1

Ubuntu 22.04

wsl --install -d Ubuntu-22.04
  • 1

git

(地址略)

Ubuntu上

针对wsl2的cuda驱动

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local

docker

curl https://get.docker.com | sh
  • 1

stable diffusion

git clone https://github.com/AbdBarho/stable-diffusion-webui-docker.git
  • 1

另外,nvidia-docker和docker-compose个人理解是不用安装的。
前者是docker的一个插件,用来自动给docker run增加gpu使用的参数,而最新版的docker在yml中已经支持gpu参数了,且在实际部署的过程中并没有用到过nvidia-docker命令。而后者docker也有自带的docker compose命令,并不需要使用docker-compose命令。

下载问题的解决方案

github上官方的安装命令非常简单,两个compose命令就可以自动下载安装包生成镜像并运行:

sudo service docker start
cd /mnt/d/yourpath/stable-diffusion-webui-docker
sudo docker compose --profile download up --build
sudo docker compose --profile auto up
  • 1
  • 2
  • 3
  • 4

但在国内执行时遇到的最多的问题就是下载中断。这时候如果重试仍然下载不了,就需要手动修改dockerfile文件来解决。

docker compose profile download下载错误

webui-docker-download-1  | edc1d3|OK  |   1.7MiB/s|/data/VAE/vae-ft-mse-840000-ema-pruned.ckpt
webui-docker-download-1  | e85b41|OK  |    99KiB/s|/data/RealESRGAN/RealESRGAN_x4plus_anime_6B.pth
webui-docker-download-1  | 011c41|OK  |   2.4KiB/s|/data/LDSR/project.yaml
webui-docker-download-1  | d56dbc|OK  |   1.6MiB/s|/data/LDSR/model.ckpt
webui-docker-download-1  | 53c224|OK  |   2.0MiB/s|/data/StableDiffusion/v1-5-pruned-emaonly.ckpt
webui-docker-download-1  | a573f8|OK  |   1.9MiB/s|/data/StableDiffusion/sd-v1-5-inpainting.ckpt
webui-docker-download-1  | 9521b5|ERR |    28KiB/s|/data/RealESRGAN/RealESRGAN_x4plus.pth
webui-docker-download-1  | 3a3491|ERR |    87KiB/s|/data/GFPGAN/GFPGANv1.4.pth
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

一般重新执行一次就可以(会续传)

github依赖的下载失败问题

如果碰到类似下面的github下载超时的问题:

 #0 120.9 fatal: unable to access 'https://github.com/crowsonkb/k-diffusion.git/': HTTP/2 stream 1 was
 not closed cleanly before end of the underlying stream
  • 1
  • 2

有一些github依赖无论重试多少次都会失败,可以把dockerfile(stable-diffusion-webui-docker\services\AUTOMATIC1111\Dockerfile)中的github手动改为kgithub即可成功下载。
例如:

RUN . /clone.sh k-diffusion https://kgithub.com/crowsonkb/k-diffusion.git 5b3af030dd83e0297272d861c19477735d0317ec
RUN . /clone.sh clip-interrogator https://kgithub.com/pharmapsychotic/clip-interrogator 2486589f24165c8e3b303f84e9dbbea318df83e8
  • 1
  • 2
#0 0.910 04/13 13:43:47 [ERROR] CUID#7 - Download aborted. URI=https://kgithub.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl
#0 0.910 Exception: [AbstractCommand.cc:351] errorCode=1 URI=https://kgithub.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl
#0 0.910   -> [SocketCore.cc:1018] errorCode=1 SSL/TLS handshake failure:  `not signed by known authorities or invalid' `expired'
  • 1
  • 2
  • 3

手动下载libgoogle-perftools-dev_2.7-1_amd64.deb和xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl(改名为wheel.whl),注释掉aria2c下载命令,改为使用COPY命令拷贝到容器可解决:

#RUN aria2c -x 5 --dir / --out wheel.whl 'https://github.com/AbdBarho/stable-diffusion-webui-docker/releases/download/5.0.0/xformers-0.0.17.dev449-cp310-cp310-manylinux2014_x86_64.whl'
COPY wheel.whl /
COPY libgoogle-perftools-dev_2.7-1_amd64.deb /
  • 1
  • 2
  • 3
#0 141.8 pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out.
#0 63.88   Downloading gradio-3.15.0-py3-none-any.whl (13.8 MB)
#0 141.8      ╸                                        0.2/13.8 MB 3.2 kB/s eta 1:10:40
  • 1
  • 2
  • 3

设置pip全局镜像源可解决:

pip config --global set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip config --global set install.trusted-host mirrors.aliyun.com
pip install -r requirements_versions.txt
  • 1
  • 2
  • 3
#0 0.451 fatal: unable to access 'https://kgithub.com/AUTOMATIC1111/stable-diffusion-webui.git/': server certificate verification failed. CAfile: none CRLfile: none
  • 1

关闭证书验证可解决:

git config --global http.version HTTP/1.1
git config --global http.sslverify false
git clone https://kgithub.com/AUTOMATIC1111/stable-diffusion-webui.git
  • 1
  • 2
  • 3
failed to solve: process "/bin/bash -ceuxo pipefail apt-get -y install libgoogle-perftools-dev && apt-get clean" did not complete successfully: exit code: 100
  • 1

向 /etc/apt/sources.list 文件中添加镜像可解决:

RUN echo 'deb http://ftp.cn.debian.org/debian buster main' | tee -a /etc/apt/sources.list
RUN apt-get -y install libgoogle-perftools-dev && apt-get clean
  • 1
  • 2

容器中无法使用GPU的问题

容器启动失败,提示找不到GPU驱动:

webui-docker-auto-1  | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
  • 1

在容器中执行nvidia-smi提示下面信息:

Failed to initialize NVML: GPU access blocked by the operating system
Failed to properly shut down NVML: GPU access blocked by the operating system
  • 1
  • 2

搜索发现这个是微软wsl2自身的问题:
https://github.com/microsoft/WSL/issues/9962

解决方案就是下载最新的wsl1.2.3版本并安装:
https://github.com/microsoft/WSL/releases/tag/1.2.3

wsl --shutdown
$Package = Get-AppxPackage MicrosoftCorporationII.WindowsSubsystemforLinux -AllUsers
Remove-AppxPackage $Package -AllUsers
Add-AppxPackage Microsoft.WSL_1.2.3.0_x64_ARM64.msixbundle
  • 1
  • 2
  • 3
  • 4

其它问题

docker启动时有如下错误:

mount: /sys/fs/cgroup/cpuset: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpu: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/cpuacct: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/blkio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/memory: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/devices: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/freezer: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_cls: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/perf_event: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/net_prio: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/hugetlb: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/pids: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/rdma: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
mount: /sys/fs/cgroup/misc: wrong fs type, bad option, bad superblock on cgroup, missing codepage or helper program, or other error.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

该错误是wsl 1.1.6.0的bug(但似乎并不影响docker启动)。升级到1.2.3可解决。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小小林熬夜学编程/article/detail/104036
推荐阅读
相关标签
  

闽ICP备14008679号