当前位置:   article > 正文

qwen-1.5运行和awq量化qwen大模型_qwen1.5 awq部署

qwen1.5 awq部署

1.下载qwen-1.5:

GitHub - QwenLM/Qwen1.5: Qwen1.5 is the improved version of Qwen, the large language model series developed by Qwen team, Alibaba Cloud.

2.下载模型:

https://huggingface.co/Qwen

3. 安装各种库,tansformers....等等,注意版本;

4.写下自己的run脚本,下面举例:(模型地址记得要写全路径)

  1. from transformers import AutoModelForCausalLM, AutoTokenizer
  2. device = "cuda" # the device to load the model onto
  3. model = AutoModelForCausalLM.from_pretrained(
  4. "Qwen1.5-7B-Chat",
  5. torch_dtype="auto",
  6. device_map="auto"
  7. )
  8. tokenizer = AutoTokenizer.from_pretrained("Qwen1.5-7B-Chat")
  9. prompt = "Give me a short introduction to large language model."
  10. messages = [
  11. {"role": "system", "content": "You are a helpful assistant."},
  12. {"role": "user", "content": prompt}
  13. ]
  14. text = tokenizer.apply_chat_template(
  15. messages,
  16. tokenize=False,
  17. add_generation_prompt=True
  18. )
  19. model_inputs = tokenizer([text], return_tensors="pt").to(device)
  20. generated_ids = model.generate(
  21. model_inputs.input_ids,
  22. max_new_tokens=512
  23. )
  24. generated_ids = [
  25. output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
  26. ]
  27. response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

5.下载autoawq:

https://github.com/casper-hansen/AutoAWQ

6.源码安装:

git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip install -e .

7.运行下面脚本,记得路径要写全路径:

  1. rom awq import AutoAWQForCausalLM
  2. from transformers import AutoTokenizer
  3. model_path = 'Qwen1.5-7B-Chat'
  4. quant_path = 'Qwen1.5-7B-Chat-AWQ-MALI'
  5. quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
  6. # Load model
  7. model = AutoAWQForCausalLM.from_pretrained(
  8. model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
  9. )
  10. tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
  11. # Quantize
  12. model.quantize(tokenizer, quant_config=quant_config)
  13. # Save quantized model
  14. model.save_quantized(quant_path)
  15. tokenizer.save_pretrained(quant_path)
  16. print(f'Model is quantized and saved at "{quant_path}"')

8.把新生成的模型替换为第一个代码的,执行

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/从前慢现在也慢/article/detail/860368
推荐阅读
相关标签
  

闽ICP备14008679号