赞
踩
pytorch-textregression是一个以pytorch和transformers为基础,专注于中文文本回归的轻量级自然语言处理工具,支持多值回归等。
1. 文本回归 (txt格式, 每行为一个json): 1.1 单个得分格式: {"text": "你安静!", "label": [1]} {"text": "斗牛场是多么欢乐阿!", "label": [1]} {"text": "今天你不必做作业。", "label": [0]} {"text": "他醒来时,几乎无法说话。", "label": [0]} {"text": "在那天边隐约闪亮的不就是黄河?", "label": [1]} 1.2 多个得分格式: {"text": "你安静!", "label": [1,0]} {"text": "斗牛场是多么欢乐阿!", "label": [1,0]} {"text": "今天你不必做作业。", "label": [0,0]} {"text": "他醒来时,几乎无法说话。", "label": [0,0]} {"text": "在那天边隐约闪亮的不就是黄河?", "label": [1,0]}
更多样例sample详情见test/tr目录
训练 python tet_tr_base_train.py
预测 python tet_tr_base_predict.py
# 适配linux import platform import json import sys import os path_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "../..")) path_sys = os.path.join(path_root, "pytorch_nlu", "pytorch_textregression") sys.path.append(path_sys) print(path_root) # 分类下的引入, pytorch_textclassification from trConfig import model_config from trTools import get_current_time # 训练-验证语料地址, 可以只输入训练地址 path_corpus = path_root + "/corpus/text_regression/negative_sentence" path_train = os.path.join(path_corpus, "train.json") path_dev = os.path.join(path_corpus, "dev.json") model_config["evaluate_steps"] = evaluate_steps # 评估步数 model_config["save_steps"] = save_steps # 存储步数 model_config["path_train"] = path_train model_config["path_dev"] = path_dev # 预训练模型适配的class model_type = ["BERT", "ERNIE", "BERT_WWM", "ALBERT", "ROBERTA", "XLNET", "ELECTRA"] pretrained_model_name_or_path = { "BERT_WWM": "hfl/chinese-bert-wwm-ext", "ROBERTA": "hfl/chinese-roberta-wwm-ext", "ALBERT": "uer/albert-base-chinese-cluecorpussmall", "XLNET": "hfl/chinese-xlnet-mid", "ERNIE": "nghuyong/ernie-1.0-base-zh", # "ERNIE": "nghuyong/ernie-3.0-base-zh", "BERT": "bert-base-chinese", # "BERT": "hfl/chinese-macbert-base", } idx = 1 # 选择的预训练模型类型---model_type model_config["pretrained_model_name_or_path"] = pretrained_model_name_or_path[model_type[idx]] model_config["model_save_path"] = "../output/text_regression/model_{}".format(model_type[idx]) model_config["model_type"] = model_type[idx] # os.environ["CUDA_VISIBLE_DEVICES"] = str(model_config["CUDA_VISIBLE_DEVICES"]) # main lc = TextRegression(model_config) lc.process() lc.train()
For citing this work, you can refer to the present GitHub project. For example, with BibTeX:
@software{Pytorch-NLU,
url = {https://github.com/yongzhuo/Pytorch-NLU},
author = {Yongzhuo Mo},
title = {Pytorch-NLU},
year = {2021}
*希望对你有所帮助!
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。