赞
踩
Llama access request form - Meta AI
申请后
下载python3.10.9 添加到path要勾选
下载git Git - Downloading Package (git-scm.com)安装默认
GNU Wget 1.21.4 for Windows (eternallybored.org) 下载wget,解压后将路径添加到path
下载llama代码 https://github.com/facebookresearch/llama.git
在D盘git bash git clone https://github.com/facebookresearch/llama.git
在Llama文件夹内 git bash ./download.sh 回车,输入邮箱收到的长链接。(建议在huggingface上注册账号,以此邮箱绑定)
Llama issues:
Can you run the Llama-7B model on Windows and/or macOS?
The vanilla model shipped in the repository does not run on Windows and/or macOS out of the box. There are some community led projects that support running Llama on Mac, Windows, iOS, Android or anywhere (e.g llama cpp, MLC LLM, and Llama 2 Everywhere). You can also find a work around at this issue based on Llama 2 fine tuning.
What are the hardware SKU requirements for deploying these models?
Hardware requirements vary based on latency, throughput and cost constraints. For good latency, we split models across multiple GPUs with tensor parallelism in a machine with NVIDIA A100s or H100s. But TPUs, other types of GPUs, or even commodity hardware can also be used to deploy these models (e.g. llama cpp, MLC LLM).
你能在Windows和/或macOS上运行Llama-7B型号吗?
存储库中提供的型号不能在Windows和/或macOS上开箱即用。有一些社区主导的项目支持在Mac、Windows、iOS、Android或任何地方运行Llama(例如Llama cpp、MLC LLM和Llama 2 Everywhere)。你也可以在这个问题上找到一个基于Llama 2微调的解决方案。
部署这些型号的硬件SKU要求是什么?
硬件要求因延迟、吞吐量和成本限制而异。为了获得良好的延迟,我们在具有NVIDIA A100s或H100s的机器中使用张量并行性将模型拆分为多个GPU。但是TPU、其他类型的GPU,甚至商品硬件也可以用于部署这些模型(例如,llama cpp、MLC LLM)。
引自https://blog.csdn.net/Fatfish7/article/details/131925595:
全精度llama2 7B最低显存要求:28GB
全精度llama2 13B最低显存要求:52GB
全精度llama2 70B最低显存要求:280GB
16精度llama2 7B预测最低显存要求:14GB
16精度llama2 13B预测最低显存要求:26GB
16精度llama2 70B预测最低显存要求:140GB
8精度llama2 7B预测最低显存要求:7GB
8精度llama2 13B预测最低显存要求:13GB
8精度llama2 70B预测最低显存要求:70GB
4精度llama2 7B预测最低显存要求:3.5GB
4精度llama2 13B预测最低显存要求:6.5GB
4精度llama2 70B预测最低显存要求:35GB
下载llama .cpp git clone https://github.com/ggerganov/llama.cpp
llama安装:
Releases · skeeto/w64devkit (github.com)
下载最新fortran压缩包
解压缩后运行w64devkit.exe
cd llama.cpp
make llama.cpp编译完成
进入llama文件夹,创建conda环境:
conda create -n llama python=3.10.9
此处如果失败为:Collecting package metadata (current_repodata.json): failed
关代理解决
conda activate llama
cd llama.cpp
pip install -r requirements.txt
将7B模型(14G左右)转换成 ggml FP16模型
python convert.py models/llama-2-7b
遇到问题:Found vocab files: {'tokenizer.model': None, 'vocab.json': None, 'tokenizer.json': None} FileNotFoundError: spm vocab not found.
将llama中tokenizer.model复制过去,解决
模型写到了 models/llama-2-7b/ggml-model-f16.gguf 文件中
将刚才转换好的FP16模型进行4-bit量化:
./quantize ./models/llama-2-7b/ggml-model-f16.gguf ./models/llama-2-7b/ggml-model-q4_0.gguf q4_0
若遇到: '.' 不是内部或外部命令,也不是可运行的程序 或批处理文件。
将./quantize改为.\quantize
量化后的文件为:./models/llama-2-7b/ggml-model-q4_0.gguf
运行:
.\main -m ./models/llama-2-7b/ggml-model-q4_0.gguf --prompt "hello"
改成7b-chat试试
python convert.py .\models\llama-2-7b-chat\
.\quantize ./models/llama-2-7b-chat/ggml-model-f16.gguf ./models/llama-2-7b-chat/ggml-model-q4_0.gguf q4_0
.\main -m ./models/llama-2-7b-chat/ggml-model-q4_0.gguf --prompt "hello"
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。