赞
踩
qwen1.5 模型的问答生成方式发生了变化,不再支持 mode.chat(),但整体来看, 1.5版本的问答效果确实有了很大提升。
本文仍在编辑中
qwen-7B 大语言模型的加载方式如下
qwen1.5B 大语言模型的加载方式如下:
- import pandas as pd
- from transformers import AutoModelForCausalLM, AutoTokenizer # transformer>=4.37.2
-
- """================Qwen-7B-15GB-推理运行大小-17GB-微调训练32GB--================="""
-
- device = "cuda"
- model_id = "../model/Qwen1.5-7B-Chat"
- # 这里设置torch_dtype=torch.bfloat16 ,否则模型会按照全精度加载,GPU推理运存会从17G翻倍到34G
- model= AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=False)
- tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False)
-
-
- prompt = """帮我把空调打开。
- 空调温度调节到24℃。
- 空调打开制冷模式,风速设为低档。
- 根据上述信息,分别提取出空调的开关状态、温度设置、风速设置、空调模式 """
-
- print("=== * ==="*50)
-
- def qwen_chat(prompt):
- messages = [
- {"role": "system", "content": "You are a helpful assistant. "},
- {"role": "user", "content": prompt}
- ]
-
- text = tokenizer.apply_chat_template(
- messages,
- tokenize=False,
- add_generation_prompt=True
- )
-
- print("=== tokenizer is finished ===")
- model_inputs = tokenizer([text], return_tensors="pt").to(device)
-
- # 注意这里需要设置 pad_token_id=tokenizer.eos_token_id,否则会出现warnning错误, hf上的脚本示例没有写
- generated_ids = model.generate(
- model_inputs.input_ids,
- max_new_tokens=512,
- pad_token_id=tokenizer.eos_token_id
- )
-
- generated_ids = [
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
- ]
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-
- print(f'response: {response}')
- return response
-
- if __name__ == '__main__':
- prompt = prompt
- output = qwen_chat(prompt=prompt)
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。