赞
踩
前阵子把手上的Lenovo ThinkPad X13s手动升级到了Windows 11 22h2,为了能尽可能的体验下实际效果,决定从Stable-diffusion入手。
本来呢,用Windows内置的ML框架,搭配 onnx-runtime或者directml跑些简单的推理照说问题并不大了。结果试了微软官方给的Stable-diffusion c# 示例,搭了 olive-ai[directml]的0.2.1 ~ 0.3.1 多个版本和ort多个版本, 始终以找不到GroupNorm Kernel实现的错误收场。
OnnxRuntimeException: [ErrorCode:NotImplemented] Failed to find kernel for GroupNorm(1) (node GroupNorm_0). Kernel not found
于是退而求其次,试下WOA平台上的其他深度学习框架的进展,于是便意外有了这次笔记。
机器出厂带的系统是Windows 10, 没有收到Windows 11的推送, 于是用微软的官方工具,下载了Windows 11 安装包, 手动升级至22h2。
升级完成后,通过Windows Update更新了全部更新包和驱动。
最后,从联想官方下载了最新的Graphcis驱动。
为了把Stable-diffusion跑起来,最好得有GPU支持。X13s的Adreno GPU没有cuda,但是高通已经提供了direct12支持,于是考虑directml曲线救国。而且幸运的是,找了一圈发现有大牛已经做好了directml版本的stable-diffusion-webui。
python已经从3.11开始提供了Windows on Arm的正式版本,于是自然的就先找了最新的Python for Windows Arm64版本。但是接下来发现时至今日,Pytorch没有发布WOA版本,更不用说directml版的Pytorch了。
想到Windows 11提供了Arm64EC的方式对x64提供支持,考虑到arm64原生版本的残缺, 要不整个软件栈就用x64版试试?
先从各自官网下载下面的软件,再将各自安好:
Application | Version (x64) |
---|---|
git | 2.43.0 64-bit version |
python | 3.10.11-amd64 |
再从stable-diffusion-webui-directml 仓库拉代码:
git clone https://github.com/lshqqytiger/stable-diffusion-webui-directml.git
当前拉下来的版本是1.6.1。
C:\xeng\stable-diffusion-webui-directml>git show
commit 03eec1791be011e087985ae93c1f66315d5a250e (HEAD -> master, origin/master, origin/HEAD)
Merge: 64e6b068 4afaaf8a
Author: Seunghoon Lee <lshqqytiger@naver.com>
Date: Wed Nov 8 13:09:37 2023 +0900
Merge remote-tracking branch 'upstream/master'
然后跟stable-diffusion-webui的配置流程一样:
比预想的好,顺利启动:
Creating venv in directory C:\xeng\stable-diffusion-webui-directml\venv using python "C:\Users\xeng\AppData\Local\Programs\Python\Python310\python.exe" venv "C:\xeng\stable-diffusion-webui-directml\venv\Scripts\Python.exe" fatal: No names found, cannot describe anything. Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] Version: 1.6.1 Commit hash: 03eec1791be011e087985ae93c1f66315d5a250e Installing torch and torchvision Collecting torch==2.0.0 Using cached torch-2.0.0-cp310-cp310-win_amd64.whl (172.3 MB) Collecting torchvision==0.15.1 Using cached torchvision-0.15.1-cp310-cp310-win_amd64.whl (1.2 MB) Collecting torch-directml Using cached torch_directml-0.2.0.dev230426-cp310-cp310-win_amd64.whl (8.2 MB) Collecting networkx Using cached networkx-3.2.1-py3-none-any.whl (1.6 MB) Collecting jinja2 Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB) Collecting typing-extensions Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB) Collecting filelock Using cached filelock-3.13.1-py3-none-any.whl (11 kB) Collecting sympy Using cached sympy-1.12-py3-none-any.whl (5.7 MB) Collecting numpy Using cached numpy-1.26.2-cp310-cp310-win_amd64.whl (15.8 MB) Collecting pillow!=8.3.*,>=5.3.0 Using cached Pillow-10.1.0-cp310-cp310-win_amd64.whl (2.6 MB) Collecting requests Using cached requests-2.31.0-py3-none-any.whl (62 kB) Collecting MarkupSafe>=2.0 Using cached MarkupSafe-2.1.3-cp310-cp310-win_amd64.whl (17 kB) Collecting urllib3<3,>=1.21.1 Using cached urllib3-2.1.0-py3-none-any.whl (104 kB) Collecting charset-normalizer<4,>=2 Using cached charset_normalizer-3.3.2-cp310-cp310-win_amd64.whl (100 kB) Collecting certifi>=2017.4.17 Using cached certifi-2023.11.17-py3-none-any.whl (162 kB) Collecting idna<4,>=2.5 Using cached idna-3.6-py3-none-any.whl (61 kB) Collecting mpmath>=0.19 Using cached mpmath-1.3.0-py3-none-any.whl (536 kB) Installing collected packages: mpmath, urllib3, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, idna, filelock, charset-normalizer, certifi, requests, jinja2, torch, torchvision, torch-directml Successfully installed MarkupSafe-2.1.3 certifi-2023.11.17 charset-normalizer-3.3.2 filelock-3.13.1 idna-3.6 jinja2-3.1.2 mpmath-1.3.0 networkx-3.2.1 numpy-1.26.2 pillow-10.1.0 requests-2.31.0 sympy-1.12 torch-2.0.0 torch-directml-0.2.0.dev230426 torchvision-0.15.1 typing-extensions-4.9.0 urllib3-2.1.0 [notice] A new release of pip is available: 23.0.1 -> 23.3.2 [notice] To update, run: C:\xeng\stable-diffusion-webui-directml\venv\Scripts\python.exe -m pip install --upgrade pip Installing clip Installing open_clip Installing requirements for CodeFormer Installing requirements Launching Web UI with arguments: no module 'xformers'. Processing without... no module 'xformers'. Processing without... No module 'xformers'. Proceeding without it. Loading weights [7c819b6d13] from C:\xeng\stable-diffusion-webui-directml\models\Stable-diffusion\majicmixRealistic_v7.safetensors Running on local URL: http://127.0.0.1:7860 To create a public link, set `share=True` in `launch()`. Creating model from config: C:\xeng\stable-diffusion-webui-directml\configs\v1-inference.yaml Startup time: 608.9s (prepare environment: 579.0s, import torch: 13.7s, import gradio: 3.6s, setup paths: 3.9s, initialize shared: 3.4s, other imports: 1.2s, setup codeformer: 0.3s, load scripts: 1.9s, create ui: 1.0s, gradio launch: 0.7s).Applying attention optimization: InvokeAI... done. Model loaded in 15.0s (load weights from disk: 1.7s, create model: 7.5s, apply weights to model: 4.5s, move model to device: 0.3s, calculate empty prompt: 0.8s).
先用默认的参数,画个麦琪,很快就OOM了。不过还好不是其他错误,有戏!
把分辨率改成256x256,看看跑的动不:
GPU全力的跑起来了,CPU的占用很低。
她来了她来了,小姐姐来了:
Sampler用restart花了快6分钟。同样分辨率,改成Euler a试试:
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [02:48<00:00, 8.43s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [02:57<00:00, 8.87s/it]
约三分钟,不过20步就不太够了:
320x320还是可以跑出来的, 10分钟,呵呵,姗姗来迟啊。Better late than never.
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [09:44<00:00, 29.22s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:55<00:00, 29.78s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [09:55<00:00, 28.29s/it]
这性能和出图大小跟手上的cuda生态比起来,确实很寒酸。
不过能用x64软件栈在WOA平台上用Pytorch顺利跑通Stable-diffusion,属实有点意外,onnx runtime团队需要加油,WOA未来可期哦。
python:
C:\xeng\>python -V
Python 3.10.11
wheels:
absl-py==2.0.0 accelerate==0.21.0 addict==2.4.0 aenum==3.1.15 aiofiles==23.2.1 aiohttp==3.9.1 aiosignal==1.3.1 altair==5.2.0 antlr4-python3-runtime==4.9.3 anyio==3.7.1 async-timeout==4.0.3 attrs==23.1.0 basicsr==1.4.2 beautifulsoup4==4.12.2 blendmodes==2022 boltons==23.1.1 cachetools==5.3.2 certifi==2023.11.17 charset-normalizer==3.3.2 clean-fid==0.1.35 click==8.1.7 clip==1.0 colorama==0.4.6 contourpy==1.2.0 cycler==0.12.1 deprecation==2.1.0 diffusers==0.24.0 einops==0.4.1 exceptiongroup==1.2.0 facexlib==0.3.0 fastapi==0.94.0 ffmpy==0.3.1 filelock==3.13.1 filterpy==1.4.5 fonttools==4.47.0 frozenlist==1.4.1 fsspec==2023.12.2 ftfy==6.1.3 future==0.18.3 gdown==4.7.1 gfpgan==1.3.8 gitdb==4.0.11 GitPython==3.1.32 google-auth==2.25.2 google-auth-oauthlib==1.2.0 gradio==3.41.2 gradio_client==0.5.0 grpcio==1.60.0 h11==0.12.0 httpcore==0.15.0 httpx==0.24.1 huggingface-hub==0.19.4 idna==3.6 imageio==2.33.1 importlib-metadata==7.0.0 importlib-resources==6.1.1 inflection==0.5.1 Jinja2==3.1.2 jsonmerge==1.8.0 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 kiwisolver==1.4.5 kornia==0.6.7 lark==1.1.2 lazy_loader==0.3 lightning-utilities==0.10.0 llvmlite==0.41.1 lmdb==1.4.1 lpips==0.1.4 Markdown==3.5.1 MarkupSafe==2.1.3 matplotlib==3.8.2 mpmath==1.3.0 multidict==6.0.4 networkx==3.2.1 numba==0.58.1 numpy==1.23.5 oauthlib==3.2.2 omegaconf==2.2.3 open-clip-torch==2.20.0 opencv-python==4.8.1.78 orjson==3.9.10 packaging==23.2 pandas==2.1.4 piexif==1.1.3 Pillow==9.5.0 platformdirs==4.1.0 protobuf==3.20.0 psutil==5.9.5 pyasn1==0.5.1 pyasn1-modules==0.3.0 pydantic==1.10.13 pydub==0.25.1 pyparsing==3.1.1 PySocks==1.7.1 python-dateutil==2.8.2 python-multipart==0.0.6 pytorch-lightning==1.9.4 pytz==2023.3.post1 PyWavelets==1.5.0 PyYAML==6.0.1 realesrgan==0.3.0 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 requests-oauthlib==1.3.1 resize-right==0.0.2 rpds-py==0.15.2 rsa==4.9 safetensors==0.3.1 scikit-image==0.21.0 scipy==1.11.4 semantic-version==2.10.0 sentencepiece==0.1.99 six==1.16.0 smmap==5.0.1 sniffio==1.3.0 soupsieve==2.5 starlette==0.26.1 sympy==1.12 tb-nightly==2.16.0a20231219 tensorboard-data-server==0.7.2 tf-keras-nightly==2.16.0.dev2023121910 tifffile==2023.12.9 timm==0.9.2 tokenizers==0.13.3 tomesd==0.1.3 tomli==2.0.1 toolz==0.12.0 torch==2.0.0 torch-directml==0.2.0.dev230426 torchdiffeq==0.2.3 torchmetrics==1.2.1 torchsde==0.2.5 torchvision==0.15.1 tqdm==4.66.1 trampoline==0.1.2 transformers==4.30.2 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 wcwidth==0.2.12 websockets==11.0.3 Werkzeug==3.0.1 yapf==0.40.2 yarl==1.9.4 zipp==3.17.0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。