当前位置:   article > 正文

轻量微调和推理stanford_alpca_nvidia-cuda-nvrtc-cu11

nvidia-cuda-nvrtc-cu11

当前的Alpaca模型是在Self-Instruct论文中使用的技术生成的52K条指令数据,从7B LLaMA模型微调而来,并进行了一些修改。

A10 gpu显存:22G,cu117,驱动470.103.01

  1. absl-py 1.4.0
  2. accelerate 0.18.0
  3. addict 2.4.0
  4. aenum 3.1.12
  5. aiofiles 23.1.0
  6. aiohttp 3.8.4
  7. aiosignal 1.3.1
  8. albumentations 0.4.3
  9. altair 4.2.2
  10. antlr4-python3-runtime 4.9.3
  11. anyio 3.6.2
  12. appdirs 1.4.4
  13. asttokens 2.2.1
  14. async-timeout 4.0.2
  15. attrs 22.2.0
  16. backcall 0.2.0
  17. basicsr 1.4.2
  18. bcrypt 4.0.1
  19. beautifulsoup4 4.12.1
  20. blendmodes 2022
  21. blinker 1.6
  22. boltons 23.0.0
  23. braceexpand 0.1.7
  24. cachetools 5.3.0
  25. certifi 2022.12.7
  26. cffi 1.15.1
  27. chardet 4.0.0
  28. charset-normalizer 3.1.0
  29. clean-fid 0.1.29
  30. click 8.1.3
  31. clip-anytorch 2.5.2
  32. cmake 3.26.1
  33. comm 0.1.3
  34. contourpy 1.0.7
  35. cryptography 40.0.1
  36. cssselect2 0.7.0
  37. cycler 0.11.0
  38. datasets 2.11.0
  39. debugpy 1.6.7
  40. decorator 5.1.1
  41. deprecation 2.1.0
  42. diffusers 0.15.0.dev0
  43. dill 0.3.6
  44. docker-pycreds 0.4.0
  45. einops 0.4.1
  46. entrypoints 0.4
  47. executing 1.2.0
  48. facexlib 0.2.5
  49. fastapi 0.94.0
  50. ffmpy 0.3.0
  51. filelock 3.10.7
  52. filterpy 1.4.5
  53. fire 0.5.0
  54. font-roboto 0.0.1
  55. fonts 0.0.3
  56. fonttools 4.39.3
  57. frozenlist 1.3.3
  58. fsspec 2023.3.0
  59. ftfy 6.1.1
  60. future 0.18.3
  61. gdown 4.7.1
  62. gfpgan 1.3.8
  63. gitdb 4.0.10
  64. GitPython 3.1.30
  65. google-auth 2.17.2
  66. google-auth-oauthlib 1.0.0
  67. gradio 3.16.2
  68. grpcio 1.53.0
  69. h11 0.12.0
  70. httpcore 0.15.0
  71. httpx 0.23.3
  72. huggingface-hub 0.15.1
  73. idna 2.10
  74. imageio 2.9.0
  75. imageio-ffmpeg 0.4.2
  76. imgaug 0.2.6
  77. importlib-metadata 6.1.0
  78. inflection 0.5.1
  79. ipykernel 6.23.1
  80. ipython 8.13.2
  81. jedi 0.18.2
  82. Jinja2 3.1.2
  83. joblib 1.2.0
  84. jsonmerge 1.8.0
  85. jsonschema 4.17.3
  86. jupyter_client 8.2.0
  87. jupyter_core 5.3.0
  88. kiwisolver 1.4.4
  89. kornia 0.6.7
  90. lark 1.1.2
  91. lazy_loader 0.2
  92. linkify-it-py 2.0.0
  93. lit 16.0.0
  94. llvmlite 0.39.1
  95. lmdb 1.4.0
  96. lpips 0.1.4
  97. lxml 4.9.2
  98. Markdown 3.4.3
  99. markdown-it-py 2.2.0
  100. MarkupSafe 2.1.2
  101. matplotlib 3.7.1
  102. matplotlib-inline 0.1.6
  103. mdit-py-plugins 0.3.5
  104. mdurl 0.1.2
  105. mpmath 1.3.0
  106. multidict 6.0.4
  107. multiprocess 0.70.14
  108. mypy-extensions 1.0.0
  109. nest-asyncio 1.5.6
  110. networkx 3.1rc0
  111. nltk 3.8.1
  112. numba 0.56.4
  113. numexpr 2.8.4
  114. numpy 1.23.3
  115. nvidia-cublas-cu11 11.10.3.66
  116. nvidia-cuda-cupti-cu11 11.7.101
  117. nvidia-cuda-nvrtc-cu11 11.7.99
  118. nvidia-cuda-runtime-cu11 11.7.99
  119. nvidia-cudnn-cu11 8.5.0.96
  120. nvidia-cufft-cu11 10.9.0.58
  121. nvidia-curand-cu11 10.2.10.91
  122. nvidia-cusolver-cu11 11.4.0.1
  123. nvidia-cusparse-cu11 11.7.4.91
  124. nvidia-nccl-cu11 2.14.3
  125. nvidia-nvtx-cu11 11.7.91
  126. oauthlib 3.2.2
  127. omegaconf 2.2.3
  128. open-clip-torch 2.7.0
  129. openai 0.27.7
  130. opencv-python 4.7.0.72
  131. opencv-python-headless 4.7.0.72
  132. orjson 3.8.9
  133. packaging 23.0
  134. pandas 1.5.3
  135. paramiko 3.1.0
  136. parso 0.8.3
  137. pathtools 0.1.2
  138. pexpect 4.8.0
  139. pickleshare 0.7.5
  140. piexif 1.1.3
  141. Pillow 9.4.0
  142. pip 23.0.1
  143. platformdirs 3.5.1
  144. prompt-toolkit 3.0.38
  145. protobuf 3.20.3
  146. psutil 5.9.4
  147. ptyprocess 0.7.0
  148. pudb 2019.2
  149. pure-eval 0.2.2
  150. pyarrow 11.0.0
  151. pyasn1 0.4.8
  152. pyasn1-modules 0.2.8
  153. pycparser 2.21
  154. pycryptodome 3.17
  155. pydantic 1.10.7
  156. pydeck 0.8.0
  157. pyDeprecate 0.3.1
  158. pydub 0.25.1
  159. Pygments 2.14.0
  160. Pympler 1.0.1
  161. PyNaCl 1.5.0
  162. pyparsing 3.0.9
  163. pyre-extensions 0.0.23
  164. pyrsistent 0.19.3
  165. PySocks 1.7.1
  166. python-dateutil 2.8.2
  167. python-multipart 0.0.6
  168. pytorch-lightning 1.7.6
  169. pytz 2023.3
  170. pytz-deprecation-shim 0.1.0.post0
  171. PyWavelets 1.4.1
  172. PyYAML 6.0
  173. pyzmq 25.1.0
  174. realesrgan 0.3.0
  175. regex 2023.3.23
  176. reportlab 3.6.12
  177. requests 2.25.1
  178. requests-oauthlib 1.3.1
  179. resize-right 0.0.2
  180. responses 0.18.0
  181. rfc3986 1.5.0
  182. rich 13.3.3
  183. rouge-score 0.1.2
  184. rsa 4.9
  185. safetensors 0.2.7
  186. scikit-image 0.19.2
  187. scipy 1.10.1
  188. semver 3.0.0
  189. sentencepiece 0.1.99
  190. sentry-sdk 1.19.0
  191. setproctitle 1.3.2
  192. setuptools 59.6.0
  193. six 1.16.0
  194. smmap 5.0.0
  195. sniffio 1.3.0
  196. soupsieve 2.4
  197. stack-data 0.6.2
  198. starlette 0.26.1
  199. streamlit 1.20.0
  200. svglib 1.5.1
  201. sympy 1.12rc1
  202. tb-nightly 2.13.0a20230405
  203. tensorboard 2.12.1
  204. tensorboard-data-server 0.7.0
  205. tensorboard-plugin-wit 1.8.1
  206. termcolor 2.3.0
  207. test-tube 0.7.5
  208. tifffile 2023.3.21
  209. timm 0.6.7
  210. tinycss2 1.2.1
  211. tokenizers 0.12.1
  212. toml 0.10.2
  213. toolz 0.12.0
  214. torch 2.0.1
  215. torchdiffeq 0.2.3
  216. torchmetrics 0.11.4
  217. torchsde 0.2.5
  218. tornado 6.2
  219. tqdm 4.65.0
  220. traitlets 5.9.0
  221. trampoline 0.1.2
  222. transformers 4.28.0.dev0 /mnt/workspace/demos/alpaca/transformers
  223. triton 2.0.0
  224. typing_extensions 4.5.0
  225. typing-inspect 0.8.0
  226. tzdata 2023.3
  227. tzlocal 4.3
  228. uc-micro-py 1.0.1
  229. urllib3 1.26.15
  230. urwid 2.1.2
  231. uvicorn 0.21.1
  232. validators 0.20.0
  233. wandb 0.14.0
  234. watchdog 3.0.0
  235. wcwidth 0.2.6
  236. webdataset 0.2.5
  237. webencodings 0.5.1
  238. websockets 11.0
  239. Werkzeug 2.2.3
  240. wheel 0.37.1
  241. xformers 0.0.16rc425
  242. xxhash 3.2.0
  243. yapf 0.32.0
  244. yarl 1.8.2
  245. zipp 3.15.0

aplaca的显存要求是比较大的,目前来看基本要保证32G的显存,当然我们可以通过调整模型的结构大小来减小显存。

1.下载stanford_alpaca

  1. !wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/stanford_alpaca.tgz
  2. !tar -xvf stanford_alpaca.tgz

2.安装依赖

  1. !cd stanford_alpaca && echo y | pip uninstall torch && echo y | pip uninstall torchvision && pip install -r requirements.txt && pip install gradio
  2. !git clone https://github.com/huggingface/transformers.git && \
  3. cd transformers && \
  4. git checkout 165dd6dc916a43ed9b6ce8c1ed62c3fe8c28b6ef && \
  5. pip install -e .

3.数据准备 

  1. 数据格式如下,如需使用自己的数据进行微调可以转化成如下形式:
  2. "instruction":用于描述模型应该执行的任务
  3. "input" : 任务的可选上下文或输入。例如,当指令是“总结以下文章”时,输入就是文章。
  4. "output" :需要模型输出的答案
  5. 格式如下
  6. [
  7. {
  8. "instruction": "Give three tips for staying healthy.",
  9. "input": "",
  10. "output": "1.Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
  11. }
  12. ]
  13. # 下载数据集,如有重名文件,先将文件夹中的重名文件重命名。
  14. !wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/alpaca_data.json

4.微调模型

4.1 准备权重

llama-7B的权重大概有12G

!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/llama-7b-hf.tar.gz && tar -xvf llama-7b-hf.tar.gz

4.2 参数调节

可以对参数进行微调以适应显存,可以修改部分参数来保证在较小显存和单卡上也可以测试,根据预训练路径找到对应的config.json文件,并按照下面的参数修改./llama-7b-hf路径下面的config.json文件修改max_sequence_length和num_hidden_layers等参数可以保证较小显存也可以训练。

  1. {
  2. "architectures": ["LLaMAForCausalLM"],
  3. "bos_token_id": 0,
  4. "eos_token_id": 1,
  5. "hidden_act": "silu",
  6. "hidden_size": 4096,
  7. "intermediate_size": 11008,
  8. "initializer_range": 0.02,
  9. "max_sequence_length": 4,
  10. "model_type": "llama",
  11. "num_attention_heads": 32,
  12. "num_hidden_layers": 4,
  13. "pad_token_id": -1,
  14. "rms_norm_eps": 1e-06,
  15. "torch_dtype": "float16",
  16. "transformers_version": "4.27.0.dev0",
  17. "use_cache": true,
  18. "vocab_size": 32000
  19. }

4.3 训练

在stanford_alpaca/train.py中加上

  1. import os
  2. os.environ["WANDB_DISABLED"] = "true"
  1. # 执行训练指令
  2. !torchrun --nproc_per_node=1 --master_port=29588 ./stanford_alpaca/train.py \
  3. --model_name_or_path "./llama-7b-hf" \
  4. --data_path ./alpaca_data.json \
  5. --bf16 False \
  6. --output_dir ./models/alpaca-2 \
  7. --num_train_epochs 1 \
  8. --per_device_train_batch_size 1 \
  9. --per_device_eval_batch_size 1 \
  10. --gradient_accumulation_steps 8 \
  11. --evaluation_strategy "no" \
  12. --save_strategy "steps" \
  13. --save_steps 20 \
  14. --save_total_limit 1 \
  15. --learning_rate 2e-5 \
  16. --model_max_length 4 \
  17. --weight_decay 0. \
  18. --warmup_ratio 0.03 \
  19. --lr_scheduler_type "cosine" \
  20. --logging_steps 1 \
  21. --fsdp "full_shard auto_wrap" \
  22. --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
  23. --tf32 False

5.推理阶段

  1. import transformers
  2. tokenizers = transformers.LlamaTokenizer.from_pretrained("./models/alpaca-2")
  3. model = transformers.LlamaForCausalLM.from_pretrained("./models/alpaca-2").cuda()
  4. model.eval()
  5. def gen(req):
  6. batch = tokenizers(req, return_tensors='pt', add_special_tokens=False)
  7. batch = {k: v.cuda() for k, v in batch.items()}
  8. full_completion = model.generate(inputs=batch["input_ids"],
  9. attention_mask=batch["attention_mask"],
  10. temperature=0.7,
  11. top_p=0.9,
  12. do_sample=True,
  13. num_beams=1,
  14. max_new_tokens=600,
  15. eos_token_id=tokenizers.eos_token_id,
  16. pad_token_id=tokenizers.pad_token_id)
  17. print(tokenizers.decode(full_completion[0]))
  18. gen("List all Canadian provinces in alphabetical order.")

在这个路径中有完整的原始权重

!wget  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/alpaca/gen.py

6.demo

  1. import gradio as gr
  2. import requests
  3. import json
  4. import transformers
  5. tokenizers = transformers.LlamaTokenizer.from_pretrained("./models/alpaca-2")
  6. model = transformers.LlamaForCausalLM.from_pretrained("./models/alpaca-2").cuda()
  7. model.eval()
  8. def inference(text):
  9. batch = tokenizers(text, return_tensors="pt", add_special_tokens=False)
  10. batch = {k: v.cuda() for k, v in batch.items()}
  11. full_completion = model.generate(inputs=batch["input_ids"],
  12. attention_mask=batch["attention_mask"],
  13. temperature=0.7,
  14. top_p=0.9,
  15. do_sample=True,
  16. num_beams=1,
  17. max_new_tokens=600,
  18. eos_token_id=tokenizers.eos_token_id,
  19. pad_token_id=tokenizers.pad_token_id)
  20. print(tokenizers.decode(full_completion[0]))
  21. return tokenizers.decode(full_completion[0])
  22. demo = gr.Blocks()
  23. with demo:
  24. input_prompt = gr.Textbox(label="请输入需求",
  25. value="帮我写一篇安全检查的新闻稿件。",
  26. lines=6)
  27. generated_txt = gr.Textbox(lines=6)
  28. b1 = gr.Button("发送")
  29. b1.click(inference, inputs=[input_prompt], outputs=generated_txt)
  30. demo.launch(enable_queue=True, share=True)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/AllinToyou/article/detail/364566
推荐阅读
相关标签
  

闽ICP备14008679号