赞
踩
服务器没网,需要手动下载,离线加载数据。
以加载下面这个数据集为例:
dataset = load_dataset('stereoset', 'intrasentence')
https://huggingface.co/datasets/stereoset/blob/main/stereoset.py
_DOWNLOAD_URL = "https://github.com/moinnadeem/Stereoset/raw/master/data/dev.json"
dataset = load_dataset(“X/stereoset.py”, 'intrasentence')
(如果是dataset = load_dataset(“X”, 'intrasentence')
,会走site-packages/datasets/builder.py
的def _prepare_split_single
,可能会报如下错)
ValueError: Not able to read records in the JSON file at /data/syxu/representation-engineering/data/fairness/dev.json. You should probably indicate the field of the JSON file containing your records. This JSON file contain the following fields: ['version', 'data']. Select the correct one and provide it as `field='XXX'` to the dataset loading method.
原来可能是:
data_path = dl_manager.download_and_extract(self._DOWNLOAD_URL)
注释掉这行,把data_path
直接改成’X/dev.json’
export HF_DATASETS_OFFLINE=1
更简单的,在有网络的地方,先下载数据集并保存:
from datasets import load_dataset
data = load_dataset(xxx)
data.save_to_disk(path)
之后就可以
from datasets import load_from_disk
data = load_from_disk(path)
parquet文件:
from datasets import load_dataset
dataset = load_dataset("parquet", data_files={'train': [文件路径], 'test': [同]})
https://huggingface.co/docs/datasets/v1.12.0/loading.html
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。