赞
踩
目的: 因为需要对预训练模型等做一些查看、转移操作,不想要乱码,不想频繁下载模型等;
配置 local_dir_use_symlinks=False
就不乱码了;
from huggingface_hub import snapshot_download # repo_id = "ziqingyang/chinese-alpaca-lora-7b" repo_id = "nghuyong/ernie-3.0-micro-zh" local_dir = repo_id.replace("/", "_") cache_dir = local_dir + "/cache" snapshot_download(cache_dir=cache_dir, local_dir=local_dir, repo_id=repo_id, local_dir_use_symlinks=False, # 不转为缓存乱码的形式, auto, Small files (<5MB) are duplicated in `local_dir` while a symlink is created for bigger files. resume_download=True, allow_patterns=["*.model", "*.json", "*.bin", "*.py", "*.md", "*.txt"], ignore_patterns=["*.safetensors", "*.msgpack", "*.h5", "*.ot", ], )
但是现在大模型的权重太大了,一般会拆分成比较多的文件,下载速度也有点慢;
根据地址下载, https://huggingface.co/models/{{repo_id}}
下载路径为: https://huggingface.co/{{repo_id}}/resolve/main/{{具体的文件名}}
以为repo_id=="THUDM/chatglm-6b"为例子
网址: https://huggingface.co/THUDM/chatglm-6b
比如linux可以直接使用wget wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/README.md wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/config.json wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/configuration_chatglm.py wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenizer_config.json wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/ice_text.model wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/quantization.py wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenization_chatglm.py wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/modeling_chatglm.py wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model.bin.index.json wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00001-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00002-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00003-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00004-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00005-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00006-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00007-of-00008.bin wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00008-of-00008.bin
安装git lfs: git lfs install
下载模型: git clone https://huggingface.co/THUDM/chatglm-6b
本地已经下载好的可以使用, 也可以转移模型目录,
默认windows地址在: C:\Users\{{账户}}\.cache\huggingface\hub
默认linux地址在: {{账户}}/.cache\huggingface\hub
from transformers import BertTokenizer, BertModel
repo_id = "nghuyong/ernie-3.0-micro-zh"
cache_dir = {{填实际地址}}
tokenizer = BertTokenizer.from_pretrained(repo_id, cache_dir=cache_dir)
model = BertModel.from_pretrained(repo_id, cache_dir=cache_dir)
希望对你有所帮助!
lan.zhihu.com/p/475260268)
希望对你有所帮助!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。