赞
踩
1.下载qwen-1.5:
2.下载模型:
3. 安装各种库,tansformers....等等,注意版本;
4.写下自己的run脚本,下面举例:(模型地址记得要写全路径)
- from transformers import AutoModelForCausalLM, AutoTokenizer
- device = "cuda" # the device to load the model onto
-
- model = AutoModelForCausalLM.from_pretrained(
- "Qwen1.5-7B-Chat",
- torch_dtype="auto",
- device_map="auto"
- )
- tokenizer = AutoTokenizer.from_pretrained("Qwen1.5-7B-Chat")
-
- prompt = "Give me a short introduction to large language model."
- messages = [
- {"role": "system", "content": "You are a helpful assistant."},
- {"role": "user", "content": prompt}
- ]
- text = tokenizer.apply_chat_template(
- messages,
- tokenize=False,
- add_generation_prompt=True
- )
- model_inputs = tokenizer([text], return_tensors="pt").to(device)
-
- generated_ids = model.generate(
- model_inputs.input_ids,
- max_new_tokens=512
- )
- generated_ids = [
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
- ]
-
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
5.下载autoawq:
https://github.com/casper-hansen/AutoAWQ
6.源码安装:
git clone https://github.com/casper-hansen/AutoAWQ
cd AutoAWQ
pip install -e .
7.运行下面脚本,记得路径要写全路径:
- rom awq import AutoAWQForCausalLM
- from transformers import AutoTokenizer
-
- model_path = 'Qwen1.5-7B-Chat'
- quant_path = 'Qwen1.5-7B-Chat-AWQ-MALI'
- quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
-
- # Load model
- model = AutoAWQForCausalLM.from_pretrained(
- model_path, **{"low_cpu_mem_usage": True, "use_cache": False}
- )
- tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
-
- # Quantize
- model.quantize(tokenizer, quant_config=quant_config)
-
- # Save quantized model
- model.save_quantized(quant_path)
- tokenizer.save_pretrained(quant_path)
-
- print(f'Model is quantized and saved at "{quant_path}"')
8.把新生成的模型替换为第一个代码的,执行
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。