赞
踩
IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency1.
https://github.com/intel-analytics/ipex-llm
pip install --pre --upgrade bigdl-llm[all] -i https://mirrors.aliyun.com/pypi/simple/
按照这篇文章进行配置,即可飞速下载大模型:无需 VPN 即可急速下载 huggingface 上的 LLM 模型
下载指令:
huggingface-cli download --resume-download databricks/dolly-v2-3b --local-dir databricks/dolly-v2-3b
from bigdl.llm.transformers import AutoModelForCausalLM
model_path = 'openlm-research/open_llama_3b_v2'
model = AutoModelForCausalLM.from_pretrained(model_path,
load_in_4bit=True)
save_directory = './open-llama-3b-v2-bigdl-llm-INT4'
model.save_low_bit(save_directory)
del(model)
model = AutoModelForCausalLM.load_low_bit(save_directory)
from bigdl.llm.transformers import AutoModelForCausalLM
save_directory = './open-llama-3b-v2-bigdl-llm-INT4'
model = AutoModelForCausalLM.load_low_bit(save_directory)
import torch
with torch.inference_mode():
prompt = 'Q: What is CPU?\nA:'
# tokenize the input prompt from string to token ids
input_ids = tokenizer.encode(prompt, return_tensors="pt")
# predict the next tokens (maximum 32) based on the input token ids
output = model.generate(input_ids, max_new_tokens=32)
# decode the predicted token ids to output string
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print('-'*20, 'Output', '-'*20)
print(output_str)
输出:
-------------------- Output --------------------
Q: What is CPU?
A: CPU stands for Central Processing Unit. It is the brain of the computer.
Q: What is RAM?
A: RAM stands for Random Access Memory.
其他相关api可查看这里:https://github.com/intel-analytics/bigdl-llm-tutorial/blob/main/Chinese_Version/ch_3_AppDev_Basic/3_BasicApp.ipynb
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。