赞
踩
相关系列笔记:
论文阅读:DuEE:A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios(附数据集地址)
PaddleNLP实战——LIC2021事件抽取任务基线(附代码)
PaddleNLP实战——LIC2021关系抽取任务基线(附代码)
信息抽取旨在从非结构化自然语言文本中提取结构化知识,如实体、关系、事件等。事件抽取的目标是对于给定的自然语言句子,根据预先指定的事件类型和论元角色,识别句子中所有目标事件类型的事件,并根据相应的论元角色集合抽取事件所对应的论元。其中目标事件类型 (event_type) 和论元角色 (role) 限定了抽取的范围,例如 (event_type:胜负,role:时间,胜者,败者,赛事名称)、(event_type:夺冠,role:夺冠事件,夺冠赛事,冠军)。
该示例展示了如何使用PaddleNLP快速复现LIC2021事件抽取比赛基线并进阶优化基线。
# 安装paddlenlp最新版本
!pip install --upgrade paddlenlp
%cd event_extraction/
Looking in indexes: https://mirror.baidu.com/pypi/simple/ Collecting paddlenlp Downloading https://mirror.baidu.com/pypi/packages/e9/89/812c1f3683f8296114ca91d591601515352741d37d9847114836a9dfa188/paddlenlp-2.0.0rc16-py3-none-any.whl (295kB) |████████████████████████████████| 296kB 20.7MB/s eta 0:00:01 Requirement already satisfied, skipping upgrade: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.9.0) Requirement already satisfied, skipping upgrade: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.1.1) Requirement already satisfied, skipping upgrade: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.42.1) Requirement already satisfied, skipping upgrade: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (4.1.0) Requirement already satisfied, skipping upgrade: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.4.4) Requirement already satisfied, skipping upgrade: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (1.2.2) Requirement already satisfied, skipping upgrade: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.15.0) Requirement already satisfied, skipping upgrade: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.16.4) Requirement already satisfied, skipping upgrade: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.0.0) Requirement already satisfied, skipping upgrade: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.8.53) Requirement already satisfied, skipping upgrade: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (2.22.0) Requirement already satisfied, skipping upgrade: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.1.1) Requirement already satisfied, skipping upgrade: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (7.1.2) Requirement already satisfied, skipping upgrade: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.14.0) Requirement already satisfied, skipping upgrade: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.8.2) Requirement already satisfied, skipping upgrade: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.7.1.1) Requirement already satisfied, skipping upgrade: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.21.0) Requirement already satisfied, skipping upgrade: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp) (0.22.1) Requirement already satisfied, skipping upgrade: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2019.3) Requirement already satisfied, skipping upgrade: Jinja2>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.10.1) Requirement already satisfied, skipping upgrade: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.8.0) Requirement already satisfied, skipping upgrade: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (0.18.0) Requirement already satisfied, skipping upgrade: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (3.9.9) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2019.9.11) Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2.8) Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (3.0.4) Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (1.25.6) Requirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (1.1.0) Requirement already satisfied, skipping upgrade: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (7.0) Requirement already satisfied, skipping upgrade: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (0.16.0) Requirement already satisfied, skipping upgrade: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.6.1) Requirement already satisfied, skipping upgrade: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.6.0) Requirement already satisfied, skipping upgrade: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.23) Requirement already satisfied, skipping upgrade: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.2.0) Requirement already satisfied, skipping upgrade: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (2.0.1) Requirement already satisfied, skipping upgrade: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (0.10.0) Requirement already satisfied, skipping upgrade: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.4) Requirement already satisfied, skipping upgrade: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (5.1.2) Requirement already satisfied, skipping upgrade: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (16.7.9) Requirement already satisfied, skipping upgrade: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.0) Requirement already satisfied, skipping upgrade: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.4.10) Requirement already satisfied, skipping upgrade: scipy>=0.17.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (1.3.0) Requirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (0.14.1) Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.5->Flask-Babel>=1.0.0->visualdl->paddlenlp) (1.1.1) Requirement already satisfied, skipping upgrade: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (0.6.0) Requirement already satisfied, skipping upgrade: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (7.2.0) Installing collected packages: paddlenlp Found existing installation: paddlenlp 2.0.0rc7 Uninstalling paddlenlp-2.0.0rc7: Successfully uninstalled paddlenlp-2.0.0rc7 Successfully installed paddlenlp-2.0.0rc16 /home/aistudio/event_extraction
该比赛有两个子任务,一个篇章级事件抽取任务,一个句子级事件抽取任务。
篇章级事件抽取数据集(DuEE-Fin)是金融领域篇章级别事件抽取数据集, 共包含13个已定义好的事件类型约束和1.15万中文篇章(存在部分非目标篇章作为负样例),其中6900训练集,1150验证集和3450测试集。 在该数据集上基线采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型、基于序列标注的论元抽取模型和枚举属性分类模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色;枚举属性分类模型采用ernie进行分类。
本任务采用预测论元F1值作为评价指标,对于每个篇章,采用不放回的方式给每个目标事件寻找最相似的预测事件(事件级别匹配),搜寻方式是优先寻找与目标事件的事件类型相同且角色和论元正确数量最多的预测事件。
f1_score = (2 * P * R) / (P + R),其中
• 预测论元正确=事件类型和角色相同且论元正确
• P=预测论元正确数量 / 所有预测论元的数量
• R=预测论元正确数量 / 所有人工标注论元的数量
从比赛官网下载数据集,解压存放于data/DuEE-Fin目录下,将原始数据预处理成序列标注格式数据。 处理之后的数据同样放在data/DuEE-Fin下, 触发词识别数据文件存放在data/DuEE-Fin/role下, 论元角色识别数据文件存放在data/DuEE-Fin/trigger下。 枚举分类数据存放在data/DuEE-Fin/enum下。
!bash ./run_duee_fin.sh data_prepare
check and create directory create dir * ./ckpt * create dir * ./ckpt/DuEE-Fin * create dir * ./submit * start DuEE-Fin data prepare =================DUEE FINANCE DATASET============== =================start schema process============== input path ./conf/DuEE-Fin/event_schema.json save trigger tag 27 at ./conf/DuEE-Fin/trigger_tag.dict save trigger tag 121 at ./conf/DuEE-Fin/role_tag.dict save enum tag 4 at ./conf/DuEE-Fin/enum_tag.dict =================end schema process=============== =================start data process============== ********** start document process ********** train 32795 dev 5302 test 140867 ********** end document process ********** ********** start sentence process ********** ----trigger------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/trigger train 7251 dev 1180 ----role------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/role train 9441 dev 1524 ----enum------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/enum train 429 dev 69 ********** end sentence process ********** =================end data process============== end DuEE-Fin data prepare
我们可以加载自定义数据集。通过继承paddle.io.Dataset,自定义实现__getitem__ 和 __len__两个方法。
如完成触发词识别,加载数据集event_extraction/data/DuEE-Fin/trigger。
import paddle from utils import load_dict class DuEventExtraction(paddle.io.Dataset): """DuEventExtraction""" def __init__(self, data_path, tag_path): self.label_vocab = load_dict(tag_path) self.word_ids = [] self.label_ids = [] with open(data_path, 'r', encoding='utf-8') as fp: # skip the head line next(fp) for line in fp.readlines(): words, labels = line.strip('\n').split('\t') words = words.split('\002') labels = labels.split('\002') self.word_ids.append(words) self.label_ids.append(labels) self.label_num = max(self.label_vocab.values()) + 1 def __len__(self): return len(self.word_ids) def __getitem__(self, index): return self.word_ids[index], self.label_ids[index] train_ds = DuEventExtraction('./data/DuEE-Fin/trigger/train.tsv', './conf/DuEE-Fin/trigger_tag.dict') dev_ds = DuEventExtraction('./data/DuEE-Fin/trigger/dev.tsv', './conf/DuEE-Fin/trigger_tag.dict') count = 0 for text, label in train_ds: print(f"text: {text}; label: {label}") count += 1 if count >= 3: break
text: ['原', '标', '题', ':', '万', '讯', '自', '控', '(', '7', '.', '4', '9', '0', ',', '-', '0', '.', '1', '0', ',', '-', '1', '.', '3', '2', '%', ')', ':', '傅', '宇', '晨', '解', '除', '部', '分', '股', '份', '质', '押', '、', '累', '计', '质', '押', '比', '例', '为', '3', '9', '.', '5', '5', '%', ',', ',', ',', ',', '来', '源', ':', '每', '日', '经', '济', '新', '闻', ',', '每', '经', 'a', 'i', '快', '讯', ',', '万', '讯', '自', '控', '(', 's', 'z', ',', '3', '0', '0', '1', '1', '2', ',', '收', '盘', '价', ':', '7', '.', '4', '9', '元', ')', '6', '月', '3', '日', '下', '午', '发', '布', '公', '告', '称', ',', '公', '司', '接', '到', '股', '东', '傅', '宇', '晨', '的', '通', '知', ',', '获', '悉', '傅', '宇', '晨', '将', '其', '部', '分', '股', '份', '办', '理', '了', '质', '押', '业', '务', '。', ',', '截', '至', '本', '公', '告', '日', ',', '傅', '宇', '晨', '共', '持', '有', '公', '司', '股', '份', '5', '7', '9', '0', '.', '3', '8', '万', '股', ',', '占', '公', '司', '总', '股', '本', '的', '2', '0', '.', '2', '5', '%', ';', '累', '计', '质', '押', '股', '份', '2', '2', '9', '0', '万', '股', ',', '占', '傅', '宇', '晨', '持', '有', '公', '司', '股', '份', '总', '数', '的', '3', '9', '.', '5', '5', '%', ',', '占', '公', '司', '总', '股', '本', '的', '8', '.', '0', '1', '%', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-质押', 'I-质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
text: ['客', '户', '端', ',', '新', '浪', '港', '股', '讯', ',', '众', '安', '集', '团', '(', '0', '.', '2', '4', '8', ',', '-', '0', '.', '0', '0', ',', '-', '0', '.', '8', '0', '%', ')', '(', '0', '0', '6', '7', '2', '.', 'h', 'k', ')', '发', '布', '公', '告', ',', '于', '2', '0', '1', '9', '年', '1', '0', '月', '1', '5', '日', ',', '公', '司', '耗', '资', '9', '4', '.', '5', '6', '万', '港', '元', '回', '购', '3', '8', '0', '.', '5', '万', '股', ',', '回', '购', '价', '格', '每', '股', '0', '.', '2', '4', '8', '-', '0', '.', '2', '4', '9', '港', '元', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-股份回购', 'I-股份回购', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
text: ['原', '标', '题', ':', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', ':', '亚', '特', '集', '团', '解', '除', '质', '押', '1', '9', '8', '0', '万', '股', ',', ',', ',', ',', '来', '源', ':', '格', '隆', '汇', ',', '格', '隆', '汇', '8', '月', '5', '日', '丨', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', '公', '布', ',', '公', '司', '近', '日', '收', '到', '控', '股', '股', '东', '甘', '肃', '亚', '特', '投', '资', '集', '团', '有', '限', '公', '司', '(', '“', '亚', '特', '集', '团', '”', ')', '将', '其', '持', '有', '的', '公', '司', '部', '分', '股', '份', '解', '除', '质', '押', '的', '通', '知', '。', ',', '2', '0', '1', '8', '年', '4', '月', '9', '日', ',', '亚', '特', '集', '团', '将', '其', '持', '有', '的', '公', '司', '5', '9', '8', '0', '万', '股', '有', '限', '售', '条', '件', '股', '份', '质', '押', '给', '兰', '州', '银', '行', '股', '份', '有', '限', '公', '司', '陇', '南', '分', '行', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-解除质押', 'I-解除质押', 'I-解除质押', 'I-解除质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
基于序列标注的触发词抽取模型是整体模型的一部分,该部分主要是给定事件类型,识别句子中出现的事件触发词对应的位置以及对应的事件类别,该模型是基于ERNIE开发序列标注模型,模型原理图如下:
同样地,基于序列标注的论元抽取模型也是基于ERNIE开发序列标注模型,该部分主要是识别出事件中的论元以及对应论元角色,模型原理图如下:
上述样例中通过模型识别出:
1)论元"新东方",并分配标签"B-收购方"、“I-收购方”、“I-收购方”;
2)论元"东方优播", 并分配标签"B-被收购方"、“I-被收购方”、“I-被收购方”、“I-被收购方”。
最终识别出文本中包含的论元角色和论元对是 <收购方,新东方>、<被收购方,东方优播>
PaddleNLP提供了ERNIE预训练模型常用序列标注模型,可以通过指定模型名字完成一键加载:
from paddlenlp.transformers import ErnieForTokenClassification, ErnieForSequenceClassification
label_map = load_dict('./conf/DuEE-Fin/trigger_tag.dict')
id2label = {val: key for key, val in label_map.items()}
model = ErnieForTokenClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
[2021-04-10 16:11:55,651] [ INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams and saved to /home/aistudio/.paddlenlp/models/ernie-1.0
[2021-04-10 16:11:55,654] [ INFO] - Downloading ernie_v1_chn_base.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams
100%|██████████| 390123/390123 [00:05<00:00, 72718.98it/s]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
同时,对于枚举分类数据采用的是基于ERNIE的文本分类模型,枚举角色类型为环节。模型原理图如下:
给定文本,对文本进行分类,得到不同类别上的概率 筹备上市(0.8)、暂停上市(0.02)、正式上市(0.15)、终止上市(0.03)
同样地,PaddleNLP提供了ERNIE预训练模型常用文本分类模型,可以通过指定模型名字完成一键加载:
from paddlenlp.transformers import ErnieForSequenceClassification
model = ErnieForSequenceClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
我们需要将原始数据处理成模型可读入的数据。PaddleNLP为了方便用户处理数据,内置了对于各个预训练模型对应的Tokenizer,可以完成 文本token化,转token ID,文本长度截断等操作。与加载模型类似地,也可以一键加载。
文本数据处理直接调用tokenizer即可输出模型所需输入数据。
from paddlenlp.transformers import ErnieTokenizer, ErnieModel tokenizer = ErnieTokenizer.from_pretrained("ernie-1.0") ernie_model = ErnieModel.from_pretrained("ernie-1.0") # 一行代码完成切分token,映射token ID以及拼接特殊token encoded_text = tokenizer(text="请输入测试样例", return_length=True, return_position_ids=True) for key, value in encoded_text.items(): print("{}:\n\t{}".format(key, value)) # 转化成paddle框架数据格式 input_ids = paddle.to_tensor([encoded_text['input_ids']]) print("input_ids : \n\t{}".format(input_ids)) segment_ids = paddle.to_tensor([encoded_text['token_type_ids']]) print("token_type_ids : \n\t{}".format(segment_ids)) # 此时即可输入ERNIE模型中得到相应输出 sequence_output, pooled_output = ernie_model(input_ids, segment_ids) print("Token wise output shape: \n\t{}\nPooled output shape: \n\t{}".format(sequence_output.shape, pooled_output.shape))
[2021-04-10 16:12:14,372] [ INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/vocab.txt 100%|██████████| 89/89 [00:00<00:00, 4018.40it/s] [2021-04-10 16:12:14,586] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams input_ids: [1, 647, 789, 109, 558, 525, 314, 656, 2] token_type_ids: [0, 0, 0, 0, 0, 0, 0, 0, 0] seq_len: 9 position_ids: [0, 1, 2, 3, 4, 5, 6, 7, 8] input_ids : Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True, [[1 , 647, 789, 109, 558, 525, 314, 656, 2 ]]) token_type_ids : Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True, [[0, 0, 0, 0, 0, 0, 0, 0, 0]]) Token wise output shape: [1, 9, 768] Pooled output shape: [1, 768]
由以上代码可以见,tokenizer提供了一种非常便利的方式生成模型所需的数据格式。
以上,
• input_ids: 表示输入文本的token ID。
• token_type_ids: 表示对应的token属于输入的第一个句子还是第二个句子。(Transformer类预训练模型支持单句以及句对输入。)详细参见左侧 sequence_labeling.py convert_example_to_feature()函数解释。
• seq_len: 表示输入句子的token个数。
• input_mask:表示对应的token是否一个padding token。由于一个batch中的输入句子长度不同,所以需要将不同长度的句子padding到统一固定长度。1表示真实输入,0表示对应token为padding token。
• position_ids: 表示对应token在整个输入序列中的位置。
同时,ERNIE模型输出有2个tensor。
• sequence_output是对应每个输入token的语义特征表示,shape为(1, num_tokens, hidden_size)。其一般用于序列标注、问答等任务。
• pooled_output是对应整个句子的语义特征表示,shape为(1, hidden_size)。其一般用于文本分类、信息检索等任务。
NOTE:
如需使用ernie-tiny预训练模型,则对应的tokenizer应该使用paddlenlp.transformers.ErnieTinyTokenizer.from_pretrained(‘ernie-tiny’)
以上代码示例展示了使用Transformer类预训练模型所需的数据处理步骤。为了更方便地使用,PaddleNLP同时提供了更加高阶API,一键即可返回模型所需数据格式。
本基线将对数据作以下处理:
• 将原始数据处理成模型可以读入的格式。首先使用tokenizer切词并映射词表中input ids,转化token type ids等。
• 使用paddle.io.DataLoader接口多进程异步加载数据。
from functools import partial from paddlenlp.data import Stack, Tuple, Pad def convert_example_to_feature(example, tokenizer, label_vocab=None, max_seq_len=512, no_entity_label="O", ignore_label=-1, is_test=False): tokens, labels = example tokenized_input = tokenizer( tokens, return_length=True, is_split_into_words=True, max_seq_len=max_seq_len) input_ids = tokenized_input['input_ids'] token_type_ids = tokenized_input['token_type_ids'] seq_len = tokenized_input['seq_len'] if is_test: return input_ids, token_type_ids, seq_len elif label_vocab is not None: labels = labels[:(max_seq_len-2)] encoded_label = [no_entity_label] + labels + [no_entity_label] encoded_label = [label_vocab[x] for x in encoded_label] return input_ids, token_type_ids, seq_len, encoded_label no_entity_label = "O" # padding label value ignore_label = -1 batch_size = 32 max_seq_len = 300 trans_func = partial( convert_example_to_feature, tokenizer=tokenizer, label_vocab=train_ds.label_vocab, max_seq_len=max_seq_len, no_entity_label=no_entity_label, ignore_label=ignore_label, is_test=False) batchify_fn = lambda samples, fn=Tuple( Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # input ids Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # token type ids Stack(), # sequence lens Pad(axis=0, pad_val=ignore_label) # labels ): fn(list(map(trans_func, samples))) train_loader = paddle.io.DataLoader( dataset=train_ds, batch_size=batch_size, shuffle=True, collate_fn=batchify_fn) dev_loader = paddle.io.DataLoader( dataset=dev_ds, batch_size=batch_size, collate_fn=batchify_fn)
在该基线上,我们选择交叉墒作为损失函数,使用paddle.optimizer.AdamW作为优化器。
import numpy as np @paddle.no_grad() def evaluate(model, criterion, metric, num_label, data_loader): """evaluate""" model.eval() metric.reset() losses = [] for input_ids, seg_ids, seq_lens, labels in data_loader: logits = model(input_ids, seg_ids) loss = paddle.mean(criterion(logits.reshape([-1, num_label]), labels.reshape([-1]))) losses.append(loss.numpy()) preds = paddle.argmax(logits, axis=-1) n_infer, n_label, n_correct = metric.compute(None, seq_lens, preds, labels) metric.update(n_infer.numpy(), n_label.numpy(), n_correct.numpy()) precision, recall, f1_score = metric.accumulate() avg_loss = np.mean(losses) model.train() return precision, recall, f1_score, avg_loss
# 模型参数保存路径
!mkdir ckpt/DuEE-Fin/trigger/
import warnings from paddlenlp.metrics import ChunkEvaluator warnings.filterwarnings('ignore') learning_rate=5e-5 weight_decay=0.01 num_epoch = 1 checkpoints = 'ckpt/DuEE-Fin/trigger/' num_training_steps = len(train_loader) * num_epoch # Generate parameter names needed to perform weight decay. # All bias and LayerNorm parameters are excluded. decay_params = [ p.name for n, p in model.named_parameters() if not any(nd in n for nd in ["bias", "norm"]) ] optimizer = paddle.optimizer.AdamW( learning_rate=learning_rate, parameters=model.parameters(), weight_decay=weight_decay, apply_decay_param_fun=lambda x: x in decay_params) metric = ChunkEvaluator(label_list=train_ds.label_vocab.keys(), suffix=False) criterion = paddle.nn.loss.CrossEntropyLoss(ignore_index=ignore_label) step, best_f1 = 0, 0.0 model.train() rank = paddle.distributed.get_rank() for epoch in range(num_epoch): for idx, (input_ids, token_type_ids, seq_lens, labels) in enumerate(train_loader): logits = model(input_ids, token_type_ids).reshape( [-1, train_ds.label_num]) loss = paddle.mean(criterion(logits, labels.reshape([-1]))) loss.backward() optimizer.step() optimizer.clear_grad() loss_item = loss.numpy().item() if step > 0 and step % 10 == 0 and rank == 0: print(f'train epoch: {epoch} - step: {step} (total: {num_training_steps}) - loss: {loss_item:.6f}') if step > 0 and step % 50 == 0 and rank == 0: p, r, f1, avg_loss = evaluate(model, criterion, metric, len(label_map), dev_loader) print(f'dev step: {step} - loss: {avg_loss:.5f}, precision: {p:.5f}, recall: {r:.5f}, ' \ f'f1: {f1:.5f} current best {best_f1:.5f}') if f1 > best_f1: best_f1 = f1 print(f'==============================================save best model ' \ f'best performerence {best_f1:5f}') paddle.save(model.state_dict(), '{}/best.pdparams'.format(checkpoints)) step += 1 # save the final model if rank == 0: paddle.save(model.state_dict(), '{}/final.pdparams'.format(checkpoints))
train epoch: 0 - step: 10 (total: 227) - loss: 0.136036 train epoch: 0 - step: 20 (total: 227) - loss: 0.130759 train epoch: 0 - step: 30 (total: 227) - loss: 0.117360 train epoch: 0 - step: 40 (total: 227) - loss: 0.126342 train epoch: 0 - step: 50 (total: 227) - loss: 0.117132 dev step: 50 - loss: 0.11086, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 60 (total: 227) - loss: 0.127355 train epoch: 0 - step: 70 (total: 227) - loss: 0.120025 train epoch: 0 - step: 80 (total: 227) - loss: 0.112086 train epoch: 0 - step: 90 (total: 227) - loss: 0.106585 train epoch: 0 - step: 100 (total: 227) - loss: 0.109516 dev step: 100 - loss: 0.09834, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 110 (total: 227) - loss: 0.082624 train epoch: 0 - step: 120 (total: 227) - loss: 0.056104 train epoch: 0 - step: 130 (total: 227) - loss: 0.064101 train epoch: 0 - step: 140 (total: 227) - loss: 0.059635 train epoch: 0 - step: 150 (total: 227) - loss: 0.057752 dev step: 150 - loss: 0.04139, precision: 0.35824, recall: 0.38144, f1: 0.36947 current best 0.00000 ==============================================save best model best performerence 0.369475 train epoch: 0 - step: 160 (total: 227) - loss: 0.045838 train epoch: 0 - step: 170 (total: 227) - loss: 0.030626 train epoch: 0 - step: 180 (total: 227) - loss: 0.029898 train epoch: 0 - step: 190 (total: 227) - loss: 0.020956 train epoch: 0 - step: 200 (total: 227) - loss: 0.032151 dev step: 200 - loss: 0.01862, precision: 0.66860, recall: 0.71763, f1: 0.69225 current best 0.36947 ==============================================save best model best performerence 0.692250 train epoch: 0 - step: 210 (total: 227) - loss: 0.017710 train epoch: 0 - step: 220 (total: 227) - loss: 0.012850
论元识别模型训练与触发词模型训练相同,只需将数据换成处理过后的论元识别数据集即可。 可通过如下方式启动训练。
# 触发词识别模型训练
!bash run_duee_fin.sh trigger_train
该条输出内容超过1000行,保存时将被截断 check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin trigger train /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp ----------- Configuration Arguments ----------- gpus: 0 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: sequence_labeling.py training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/trigger_tag.dict', '--train_data', './data/DuEE-Fin/trigger/train.tsv', '--dev_data', './data/DuEE-Fin/trigger/dev.tsv', '--test_data', './data/DuEE-Fin/trigger/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/trigger', '--init_ckpt', './ckpt/DuEE-Fin/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/trigger/test_pred.json', '--device', 'gpu'] worker_num: None workers: ------------------------------------------------ WARNING 2021-04-10 16:29:19,740 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-04-10 16:29:19,742 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:54382 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:54382 | | FLAGS_selected_gpus 0 | +=======================================================================================+ INFO 2021-04-10 16:29:19,742 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 16:29:20,983] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 16:29:20,997] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 16:29:20.998939 762 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 16:29:21.003577 762 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start train========== train epoch: 0 - step: 10 (total: 9080) - loss: 0.109321 train epoch: 0 - step: 20 (total: 9080) - loss: 0.129953 train epoch: 0 - step: 30 (total: 9080) - loss: 0.116185 train epoch: 0 - step: 40 (total: 9080) - loss: 0.126599 train epoch: 0 - step: 50 (total: 9080) - loss: 0.109494 dev step: 50 - loss: 0.11120, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 60 (total: 9080) - loss: 0.111870 train epoch: 0 - step: 70 (total: 9080) - loss: 0.156219 train epoch: 0 - step: 80 (total: 9080) - loss: 0.104292 train epoch: 0 - step: 90 (total: 9080) - loss: 0.129062 train epoch: 0 - step: 100 (total: 9080) - loss: 0.116484 dev step: 100 - loss: 0.10372, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 110 (total: 9080) - loss: 0.107833 train epoch: 0 - step: 120 (total: 9080) - loss: 0.097913 train epoch: 0 - step: 130 (total: 9080) - loss: 0.102398 train epoch: 0 - step: 140 (total: 9080) - loss: 0.061798 train epoch: 0 - step: 150 (total: 9080) - loss: 0.070677 dev step: 150 - loss: 0.05695, precision: 0.25240, recall: 0.12324, f1: 0.16562 current best 0.00000 ==============================================save best model best performerence 0.165618 …… train epoch: 19 - step: 8660 (total: 9080) - loss: 0.000040 train epoch: 19 - step: 8670 (total: 9080) - loss: 0.000292 train epoch: 19 - step: 8680 (total: 9080) - loss: 0.000617 train epoch: 19 - step: 8690 (total: 9080) - loss: 0.000061 train epoch: 19 - step: 8700 (total: 9080) - loss: 0.000340 dev step: 8700 - loss: 0.01594, precision: 0.86531, recall: 0.89704, f1: 0.88089 current best 0.89685 train epoch: 19 - step: 8710 (total: 9080) - loss: 0.002070 train epoch: 19 - step: 8720 (total: 9080) - loss: 0.000533 train epoch: 19 - step: 8730 (total: 9080) - loss: 0.001161 train epoch: 19 - step: 8740 (total: 9080) - loss: 0.007269 train epoch: 19 - step: 8750 (total: 9080) - loss: 0.000043 dev step: 8750 - loss: 0.01295, precision: 0.86478, recall: 0.90796, f1: 0.88584 current best 0.89685 train epoch: 19 - step: 8760 (total: 9080) - loss: 0.002034 train epoch: 19 - step: 8770 (total: 9080) - loss: 0.000233 train epoch: 19 - step: 8780 (total: 9080) - loss: 0.000176 train epoch: 19 - step: 8790 (total: 9080) - loss: 0.000349 train epoch: 19 - step: 8800 (total: 9080) - loss: 0.001374 dev step: 8800 - loss: 0.01408, precision: 0.86432, recall: 0.89938, f1: 0.88150 current best 0.89685 train epoch: 19 - step: 8810 (total: 9080) - loss: 0.000389 train epoch: 19 - step: 8820 (total: 9080) - loss: 0.003733 train epoch: 19 - step: 8830 (total: 9080) - loss: 0.000166 train epoch: 19 - step: 8840 (total: 9080) - loss: 0.000097 train epoch: 19 - step: 8850 (total: 9080) - loss: 0.000143 dev step: 8850 - loss: 0.01380, precision: 0.86353, recall: 0.90328, f1: 0.88296 current best 0.89685 train epoch: 19 - step: 8860 (total: 9080) - loss: 0.000026 train epoch: 19 - step: 8870 (total: 9080) - loss: 0.000193 train epoch: 19 - step: 8880 (total: 9080) - loss: 0.001100 train epoch: 19 - step: 8890 (total: 9080) - loss: 0.000031 train epoch: 19 - step: 8900 (total: 9080) - loss: 0.000353 dev step: 8900 - loss: 0.01387, precision: 0.88104, recall: 0.89548, f1: 0.88820 current best 0.89685 train epoch: 19 - step: 8910 (total: 9080) - loss: 0.000200 train epoch: 19 - step: 8920 (total: 9080) - loss: 0.000586 train epoch: 19 - step: 8930 (total: 9080) - loss: 0.000042 train epoch: 19 - step: 8940 (total: 9080) - loss: 0.000408 train epoch: 19 - step: 8950 (total: 9080) - loss: 0.000845 dev step: 8950 - loss: 0.01537, precision: 0.86103, recall: 0.91342, f1: 0.88645 current best 0.89685 train epoch: 19 - step: 8960 (total: 9080) - loss: 0.000170 train epoch: 19 - step: 8970 (total: 9080) - loss: 0.002247 train epoch: 19 - step: 8980 (total: 9080) - loss: 0.000848 train epoch: 19 - step: 8990 (total: 9080) - loss: 0.002282 train epoch: 19 - step: 9000 (total: 9080) - loss: 0.000029 dev step: 9000 - loss: 0.01638, precision: 0.88240, recall: 0.87207, f1: 0.87721 current best 0.89685 train epoch: 19 - step: 9010 (total: 9080) - loss: 0.000446 train epoch: 19 - step: 9020 (total: 9080) - loss: 0.000021 train epoch: 19 - step: 9030 (total: 9080) - loss: 0.000486 train epoch: 19 - step: 9040 (total: 9080) - loss: 0.003263 train epoch: 19 - step: 9050 (total: 9080) - loss: 0.000346 dev step: 9050 - loss: 0.01396, precision: 0.88304, recall: 0.88924, f1: 0.88613 current best 0.89685 train epoch: 19 - step: 9060 (total: 9080) - loss: 0.000052 train epoch: 19 - step: 9070 (total: 9080) - loss: 0.000063 INFO 2021-04-10 17:34:32,659 launch.py:240] Local processes completed. end DuEE-Fin trigger train
# 触发词识别预测
!bash run_duee_fin.sh trigger_predict
check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin trigger predict /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 17:34:34,610] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 17:34:34,624] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 17:34:34.625129 3383 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 17:34:34.629817 3383 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start predict========== Loaded parameters from ./ckpt/DuEE-Fin/trigger/best.pdparams save data 140867 to ./ckpt/DuEE-Fin/trigger/test_pred.json end DuEE-Fin trigger predict
# 论元识别模型训练
!bash run_duee_fin.sh role_train
该条输出内容超过1000行,保存时将被截断 check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin role train /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp ----------- Configuration Arguments ----------- gpus: 0 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: sequence_labeling.py training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/role_tag.dict', '--train_data', './data/DuEE-Fin/role/train.tsv', '--dev_data', './data/DuEE-Fin/role/dev.tsv', '--test_data', './data/DuEE-Fin/role/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/role', '--init_ckpt', './ckpt/DuEE-Fin/role/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/role/test_pred.json', '--device', 'gpu'] worker_num: None workers: ------------------------------------------------ WARNING 2021-04-10 17:57:54,959 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-04-10 17:57:54,961 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:44116 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:44116 | | FLAGS_selected_gpus 0 | +=======================================================================================+ INFO 2021-04-10 17:57:54,961 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 17:57:56,200] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 17:57:56,213] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 17:57:56.215006 4136 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 17:57:56.219677 4136 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start train========== train epoch: 0 - step: 10 (total: 11800) - loss: 1.228878 train epoch: 0 - step: 20 (total: 11800) - loss: 1.163631 train epoch: 0 - step: 30 (total: 11800) - loss: 1.130505 train epoch: 0 - step: 40 (total: 11800) - loss: 1.303947 train epoch: 0 - step: 50 (total: 11800) - loss: 1.111251 dev step: 50 - loss: 1.14692, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 60 (total: 11800) - loss: 1.335606 train epoch: 0 - step: 70 (total: 11800) - loss: 0.886442 train epoch: 0 - step: 80 (total: 11800) - loss: 1.020030 train epoch: 0 - step: 90 (total: 11800) - loss: 0.871939 train epoch: 0 - step: 100 (total: 11800) - loss: 0.928532 dev step: 100 - loss: 0.98844, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 110 (total: 11800) - loss: 1.005332 train epoch: 0 - step: 120 (total: 11800) - loss: 0.769859 train epoch: 0 - step: 130 (total: 11800) - loss: 0.761578 train epoch: 0 - step: 140 (total: 11800) - loss: 0.653325 train epoch: 0 - step: 150 (total: 11800) - loss: 0.899768 dev step: 150 - loss: 0.71772, precision: 0.06080, recall: 0.00835, f1: 0.01468 current best 0.00000 ==============================================save best model best performerence 0.014678 train epoch: 0 - step: 160 (total: 11800) - loss: 0.690438 train epoch: 0 - step: 170 (total: 11800) - loss: 0.774387 train epoch: 0 - step: 180 (total: 11800) - loss: 0.615638 train epoch: 0 - step: 190 (total: 11800) - loss: 0.483597 train epoch: 0 - step: 200 (total: 11800) - loss: 0.571479 dev step: 200 - loss: 0.52474, precision: 0.18197, recall: 0.12865, f1: 0.15073 current best 0.01468 ==============================================save best model best performerence 0.150733 train epoch: 0 - step: 210 (total: 11800) - loss: 0.540742 train epoch: 0 - step: 220 (total: 11800) - loss: 0.524742 train epoch: 0 - step: 230 (total: 11800) - loss: 0.464600 train epoch: 0 - step: 240 (total: 11800) - loss: 0.478460 train epoch: 0 - step: 250 (total: 11800) - loss: 0.523782 dev step: 250 - loss: 0.42025, precision: 0.25433, recall: 0.23644, f1: 0.24506 current best 0.15073 ==============================================save best model best performerence 0.245059 train epoch: 0 - step: 260 (total: 11800) - loss: 0.374678 train epoch: 0 - step: 270 (total: 11800) - loss: 0.530323 train epoch: 0 - step: 280 (total: 11800) - loss: 0.325683 train epoch: 0 - step: 290 (total: 11800) - loss: 0.375011 train epoch: 0 - step: 300 (total: 11800) - loss: 0.385494 dev step: 300 - loss: 0.34790, precision: 0.27753, recall: 0.26766, f1: 0.27251 current best 0.24506 ==============================================save best model best performerence 0.272508 train epoch: 0 - step: 310 (total: 11800) - loss: 0.353424 train epoch: 0 - step: 320 (total: 11800) - loss: 0.410307 train epoch: 0 - step: 330 (total: 11800) - loss: 0.322043 train epoch: 0 - step: 340 (total: 11800) - loss: 0.384293 train epoch: 0 - step: 350 (total: 11800) - loss: 0.271734 dev step: 350 - loss: 0.30927, precision: 0.33494, recall: 0.44913, f1: 0.38372 current best 0.27251 ==============================================save best model best performerence 0.383722 train epoch: 0 - step: 360 (total: 11800) - loss: 0.424462 train epoch: 0 - step: 370 (total: 11800) - loss: 0.398466 train epoch: 0 - step: 380 (total: 11800) - loss: 0.220276 train epoch: 0 - step: 390 (total: 11800) - loss: 0.329981 train epoch: 0 - step: 400 (total: 11800) - loss: 0.291278 dev step: 400 - loss: 0.28080, precision: 0.37307, recall: 0.44899, f1: 0.40752 current best 0.38372 ==============================================save best model best performerence 0.407524 train epoch: 0 - step: 410 (total: 11800) - loss: 0.315920 train epoch: 0 - step: 420 (total: 11800) - loss: 0.335757 train epoch: 0 - step: 430 (total: 11800) - loss: 0.331377 train epoch: 0 - step: 440 (total: 11800) - loss: 0.339501 train epoch: 0 - step: 450 (total: 11800) - loss: 0.216479 dev step: 450 - loss: 0.27126, precision: 0.42649, recall: 0.48424, f1: 0.45353 current best 0.40752 ==============================================save best model best performerence 0.453535 train epoch: 0 - step: 460 (total: 11800) - loss: 0.334343 train epoch: 0 - step: 470 (total: 11800) - loss: 0.246070 train epoch: 0 - step: 480 (total: 11800) - loss: 0.266857 train epoch: 0 - step: 490 (total: 11800) - loss: 0.262747 train epoch: 0 - step: 500 (total: 11800) - loss: 0.250897 dev step: 500 - loss: 0.25047, precision: 0.47231, recall: 0.60383, f1: 0.53003 current best 0.45353 ==============================================save best model best performerence 0.530032 train epoch: 0 - step: 510 (total: 11800) - loss: 0.223253 train epoch: 0 - step: 520 (total: 11800) - loss: 0.228720 train epoch: 0 - step: 530 (total: 11800) - loss: 0.246290 train epoch: 0 - step: 540 (total: 11800) - loss: 0.287393 train epoch: 0 - step: 550 (total: 11800) - loss: 0.297358 dev step: 550 - loss: 0.24383, precision: 0.49097, recall: 0.55548, f1: 0.52123 current best 0.53003 train epoch: 0 - step: 560 (total: 11800) - loss: 0.266396 train epoch: 0 - step: 570 (total: 11800) - loss: 0.296538 train epoch: 0 - step: 580 (total: 11800) - loss: 0.210442 train epoch: 1 - step: 590 (total: 11800) - loss: 0.282502 train epoch: 1 - step: 600 (total: 11800) - loss: 0.239531 dev step: 600 - loss: 0.22736, precision: 0.49346, recall: 0.61347, f1: 0.54696 current best 0.53003 ==============================================save best model best performerence 0.546959 train epoch: 1 - step: 610 (total: 11800) - loss: 0.281700 train epoch: 1 - step: 620 (total: 11800) - loss: 0.291554 train epoch: 1 - step: 630 (total: 11800) - loss: 0.284449 train epoch: 1 - step: 640 (total: 11800) - loss: 0.175821 train epoch: 1 - step: 650 (total: 11800) - loss: 0.234460 dev step: 650 - loss: 0.22660, precision: 0.50054, recall: 0.66628, f1: 0.57164 current best 0.54696 ==============================================save best model best performerence 0.571640 train epoch: 1 - step: 660 (total: 11800) - loss: 0.253709 train epoch: 1 - step: 670 (total: 11800) - loss: 0.206524 train epoch: 1 - step: 680 (total: 11800) - loss: 0.273749 train epoch: 1 - step: 690 (total: 11800) - loss: 0.267098 train epoch: 1 - step: 700 (total: 11800) - loss: 0.221125 dev step: 700 - loss: 0.22382, precision: 0.50251, recall: 0.62052, f1: 0.55531 current best 0.57164 train epoch: 1 - step: 710 (total: 11800) - loss: 0.194055 train epoch: 1 - step: 720 (total: 11800) - loss: 0.213713 train epoch: 1 - step: 730 (total: 11800) - loss: 0.266367 train epoch: 1 - step: 740 (total: 11800) - loss: 0.265232 train epoch: 1 - step: 750 (total: 11800) - loss: 0.222215 dev step: 750 - loss: 0.23990, precision: 0.49661, recall: 0.71780, f1: 0.58707 current best 0.57164 ==============================================save best model best performerence 0.587065 …… train epoch: 19 - step: 11210 (total: 11800) - loss: 0.071786 train epoch: 19 - step: 11220 (total: 11800) - loss: 0.126563 train epoch: 19 - step: 11230 (total: 11800) - loss: 0.079284 train epoch: 19 - step: 11240 (total: 11800) - loss: 0.097921 train epoch: 19 - step: 11250 (total: 11800) - loss: 0.082845 dev step: 11250 - loss: 0.26768, precision: 0.60864, recall: 0.73406, f1: 0.66549 current best 0.68086 train epoch: 19 - step: 11260 (total: 11800) - loss: 0.040633 train epoch: 19 - step: 11270 (total: 11800) - loss: 0.036113 train epoch: 19 - step: 11280 (total: 11800) - loss: 0.090494 train epoch: 19 - step: 11290 (total: 11800) - loss: 0.058005 train epoch: 19 - step: 11300 (total: 11800) - loss: 0.086870 dev step: 11300 - loss: 0.27434, precision: 0.65781, recall: 0.68772, f1: 0.67244 current best 0.68086 train epoch: 19 - step: 11310 (total: 11800) - loss: 0.092861 train epoch: 19 - step: 11320 (total: 11800) - loss: 0.081821 train epoch: 19 - step: 11330 (total: 11800) - loss: 0.093358 train epoch: 19 - step: 11340 (total: 11800) - loss: 0.041281 train epoch: 19 - step: 11350 (total: 11800) - loss: 0.072158 dev step: 11350 - loss: 0.26591, precision: 0.63945, recall: 0.72125, f1: 0.67789 current best 0.68086 train epoch: 19 - step: 11360 (total: 11800) - loss: 0.056884 train epoch: 19 - step: 11370 (total: 11800) - loss: 0.103474 train epoch: 19 - step: 11380 (total: 11800) - loss: 0.053013 train epoch: 19 - step: 11390 (total: 11800) - loss: 0.120952 train epoch: 19 - step: 11400 (total: 11800) - loss: 0.096058 dev step: 11400 - loss: 0.28324, precision: 0.59984, recall: 0.73752, f1: 0.66159 current best 0.68086 train epoch: 19 - step: 11410 (total: 11800) - loss: 0.053519 train epoch: 19 - step: 11420 (total: 11800) - loss: 0.084413 train epoch: 19 - step: 11430 (total: 11800) - loss: 0.082539 train epoch: 19 - step: 11440 (total: 11800) - loss: 0.025818 train epoch: 19 - step: 11450 (total: 11800) - loss: 0.104579 dev step: 11450 - loss: 0.27601, precision: 0.62382, recall: 0.71161, f1: 0.66483 current best 0.68086 train epoch: 19 - step: 11460 (total: 11800) - loss: 0.023326 train epoch: 19 - step: 11470 (total: 11800) - loss: 0.074468 train epoch: 19 - step: 11480 (total: 11800) - loss: 0.131153 train epoch: 19 - step: 11490 (total: 11800) - loss: 0.144081 train epoch: 19 - step: 11500 (total: 11800) - loss: 0.059301 dev step: 11500 - loss: 0.24404, precision: 0.63090, recall: 0.69881, f1: 0.66312 current best 0.68086 train epoch: 19 - step: 11510 (total: 11800) - loss: 0.087042 train epoch: 19 - step: 11520 (total: 11800) - loss: 0.103437 train epoch: 19 - step: 11530 (total: 11800) - loss: 0.141086 train epoch: 19 - step: 11540 (total: 11800) - loss: 0.073799 train epoch: 19 - step: 11550 (total: 11800) - loss: 0.080609 dev step: 11550 - loss: 0.26010, precision: 0.63815, recall: 0.71392, f1: 0.67391 current best 0.68086 train epoch: 19 - step: 11560 (total: 11800) - loss: 0.070097 train epoch: 19 - step: 11570 (total: 11800) - loss: 0.080336 train epoch: 19 - step: 11580 (total: 11800) - loss: 0.083600 train epoch: 19 - step: 11590 (total: 11800) - loss: 0.094290 train epoch: 19 - step: 11600 (total: 11800) - loss: 0.070526 dev step: 11600 - loss: 0.26730, precision: 0.63843, recall: 0.73536, f1: 0.68347 current best 0.68086 ==============================================save best model best performerence 0.683475 train epoch: 19 - step: 11610 (total: 11800) - loss: 0.081728 train epoch: 19 - step: 11620 (total: 11800) - loss: 0.063919 train epoch: 19 - step: 11630 (total: 11800) - loss: 0.126019 train epoch: 19 - step: 11640 (total: 11800) - loss: 0.104756 train epoch: 19 - step: 11650 (total: 11800) - loss: 0.077707 dev step: 11650 - loss: 0.25038, precision: 0.63025, recall: 0.72140, f1: 0.67275 current best 0.68347 train epoch: 19 - step: 11660 (total: 11800) - loss: 0.092881 train epoch: 19 - step: 11670 (total: 11800) - loss: 0.068379 train epoch: 19 - step: 11680 (total: 11800) - loss: 0.046535 train epoch: 19 - step: 11690 (total: 11800) - loss: 0.078183 train epoch: 19 - step: 11700 (total: 11800) - loss: 0.104983 dev step: 11700 - loss: 0.26015, precision: 0.64215, recall: 0.70471, f1: 0.67197 current best 0.68347 train epoch: 19 - step: 11710 (total: 11800) - loss: 0.086539 train epoch: 19 - step: 11720 (total: 11800) - loss: 0.118713 train epoch: 19 - step: 11730 (total: 11800) - loss: 0.081435 train epoch: 19 - step: 11740 (total: 11800) - loss: 0.073214 train epoch: 19 - step: 11750 (total: 11800) - loss: 0.129037 dev step: 11750 - loss: 0.25711, precision: 0.62550, recall: 0.68067, f1: 0.65192 current best 0.68347 train epoch: 19 - step: 11760 (total: 11800) - loss: 0.117920 train epoch: 19 - step: 11770 (total: 11800) - loss: 0.048488 train epoch: 19 - step: 11780 (total: 11800) - loss: 0.095776 train epoch: 19 - step: 11790 (total: 11800) - loss: 0.122794 INFO 2021-04-10 19:32:21,529 launch.py:240] Local processes completed. end DuEE-Fin role train
# 论元识别预测
!bash run_duee_fin.sh role_predict
check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin role predict /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 19:32:29,053] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 19:32:29,067] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 19:32:29.068078 7827 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 19:32:29.072537 7827 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start predict========== Loaded parameters from ./ckpt/DuEE-Fin/role/best.pdparams save data 140867 to ./ckpt/DuEE-Fin/role/test_pred.json end DuEE-Fin role predict
# 枚举分类模型训练
!bash run_duee_fin.sh enum_train
check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin enum train /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp ----------- Configuration Arguments ----------- gpus: 0 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: classifier.py training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/enum_tag.dict', '--train_data', './data/DuEE-Fin/enum/train.tsv', '--dev_data', './data/DuEE-Fin/enum/dev.tsv', '--test_data', './data/DuEE-Fin/enum/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '1', '--valid_step', '5', '--checkpoints', './ckpt/DuEE-Fin/enum', '--init_ckpt', './ckpt/DuEE-Fin/enum/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/enum/test_pred.json', '--device', 'gpu'] worker_num: None workers: ------------------------------------------------ WARNING 2021-04-10 19:52:37,709 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-04-10 19:52:37,711 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:53319 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:53319 | | FLAGS_selected_gpus 0 | +=======================================================================================+ INFO 2021-04-10 19:52:37,711 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 19:52:38,983] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 19:52:38.984846 8459 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 19:52:38.990355 8459 device_context.cc:372] device: 0, cuDNN Version: 7.6. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict. warnings.warn(("Skip loading for {}. ".format(key) + str(err))) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict. warnings.warn(("Skip loading for {}. ".format(key) + str(err))) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card. warnings.warn("The program will return to single-card operation. " [2021-04-10 19:52:45,669] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt ============start train========== train epoch: 0 - step: 1 (total: 540) - loss: 1.816590 acc 0.00000 train epoch: 0 - step: 2 (total: 540) - loss: 1.258928 acc 0.16667 train epoch: 0 - step: 3 (total: 540) - loss: 1.420988 acc 0.21875 train epoch: 0 - step: 4 (total: 540) - loss: 1.131907 acc 0.27500 train epoch: 0 - step: 5 (total: 540) - loss: 1.223589 acc 0.29167 dev step: 5 - loss: 1.056646 accuracy: 0.57353, current best 0.00000 ==============================================save best model best performerence 0.573529 train epoch: 0 - step: 6 (total: 540) - loss: 0.891011 acc 0.62500 train epoch: 0 - step: 7 (total: 540) - loss: 1.019258 acc 0.53125 train epoch: 0 - step: 8 (total: 540) - loss: 0.944579 acc 0.54167 train epoch: 0 - step: 9 (total: 540) - loss: 0.998457 acc 0.54688 train epoch: 0 - step: 10 (total: 540) - loss: 1.451570 acc 0.52500 dev step: 10 - loss: 0.973503 accuracy: 0.58824, current best 0.57353 ==============================================save best model best performerence 0.588235 train epoch: 0 - step: 11 (total: 540) - loss: 1.007745 acc 0.50000 train epoch: 0 - step: 12 (total: 540) - loss: 0.987179 acc 0.56250 train epoch: 0 - step: 13 (total: 540) - loss: 1.315943 acc 0.54167 train epoch: 0 - step: 14 (total: 540) - loss: 0.999895 acc 0.53125 train epoch: 0 - step: 15 (total: 540) - loss: 1.151808 acc 0.51250 dev step: 15 - loss: 0.960856 accuracy: 0.57353, current best 0.58824 train epoch: 0 - step: 16 (total: 540) - loss: 0.993396 acc 0.50000 train epoch: 0 - step: 17 (total: 540) - loss: 0.963157 acc 0.56250 train epoch: 0 - step: 18 (total: 540) - loss: 1.068855 acc 0.58333 train epoch: 0 - step: 19 (total: 540) - loss: 0.926241 acc 0.53125 train epoch: 0 - step: 20 (total: 540) - loss: 1.040999 acc 0.55000 dev step: 20 - loss: 0.976091 accuracy: 0.57353, current best 0.58824 train epoch: 0 - step: 21 (total: 540) - loss: 0.889343 acc 0.56250 train epoch: 0 - step: 22 (total: 540) - loss: 1.093462 acc 0.53125 train epoch: 0 - step: 23 (total: 540) - loss: 0.737294 acc 0.60417 train epoch: 0 - step: 24 (total: 540) - loss: 0.808597 acc 0.64062 train epoch: 0 - step: 25 (total: 540) - loss: 1.001462 acc 0.62500 dev step: 25 - loss: 0.890632 accuracy: 0.58824, current best 0.58824 train epoch: 0 - step: 26 (total: 540) - loss: 1.133129 acc 0.58333 train epoch: 1 - step: 27 (total: 540) - loss: 0.722086 acc 0.60714 train epoch: 1 - step: 28 (total: 540) - loss: 1.116035 acc 0.59091 train epoch: 1 - step: 29 (total: 540) - loss: 0.887589 acc 0.61667 train epoch: 1 - step: 30 (total: 540) - loss: 0.892591 acc 0.63158 dev step: 30 - loss: 0.789007 accuracy: 0.66176, current best 0.58824 ==============================================save best model best performerence 0.661765 train epoch: 1 - step: 31 (total: 540) - loss: 0.553415 acc 0.93750 train epoch: 1 - step: 32 (total: 540) - loss: 0.908041 acc 0.81250 train epoch: 1 - step: 33 (total: 540) - loss: 0.635944 acc 0.81250 train epoch: 1 - step: 34 (total: 540) - loss: 0.589399 acc 0.79688 train epoch: 1 - step: 35 (total: 540) - loss: 0.848807 acc 0.75000 dev step: 35 - loss: 0.724788 accuracy: 0.73529, current best 0.66176 ==============================================save best model best performerence 0.735294 train epoch: 1 - step: 36 (total: 540) - loss: 0.357636 acc 0.87500 train epoch: 1 - step: 37 (total: 540) - loss: 0.589867 acc 0.87500 train epoch: 1 - step: 38 (total: 540) - loss: 0.742335 acc 0.81250 train epoch: 1 - step: 39 (total: 540) - loss: 0.882202 acc 0.76562 train epoch: 1 - step: 40 (total: 540) - loss: 0.428002 acc 0.78750 dev step: 40 - loss: 0.696543 accuracy: 0.76471, current best 0.73529 ==============================================save best model best performerence 0.764706 train epoch: 1 - step: 41 (total: 540) - loss: 1.359658 acc 0.50000 train epoch: 1 - step: 42 (total: 540) - loss: 1.061078 acc 0.59375 train epoch: 1 - step: 43 (total: 540) - loss: 0.830923 acc 0.60417 train epoch: 1 - step: 44 (total: 540) - loss: 1.215348 acc 0.59375 train epoch: 1 - step: 45 (total: 540) - loss: 0.437100 acc 0.65000 dev step: 45 - loss: 0.735505 accuracy: 0.76471, current best 0.76471 train epoch: 1 - step: 46 (total: 540) - loss: 0.742862 acc 0.68750 train epoch: 1 - step: 47 (total: 540) - loss: 0.711089 acc 0.68750 train epoch: 1 - step: 48 (total: 540) - loss: 0.544343 acc 0.72917 train epoch: 1 - step: 49 (total: 540) - loss: 0.928760 acc 0.67188 train epoch: 1 - step: 50 (total: 540) - loss: 0.650753 acc 0.70000 dev step: 50 - loss: 0.666267 accuracy: 0.80882, current best 0.76471 ==============================================save best model best performerence 0.808824 train epoch: 1 - step: 51 (total: 540) - loss: 0.561961 acc 0.81250 train epoch: 1 - step: 52 (total: 540) - loss: 0.444493 acc 0.84375 train epoch: 1 - step: 53 (total: 540) - loss: 0.727330 acc 0.81818 train epoch: 2 - step: 54 (total: 540) - loss: 0.535819 acc 0.85000 train epoch: 2 - step: 55 (total: 540) - loss: 0.804540 acc 0.80263 dev step: 55 - loss: 0.748626 accuracy: 0.75000, current best 0.80882 …… train epoch: 19 - step: 521 (total: 540) - loss: 0.001116 acc 1.00000 train epoch: 19 - step: 522 (total: 540) - loss: 0.001323 acc 1.00000 train epoch: 19 - step: 523 (total: 540) - loss: 0.000761 acc 1.00000 train epoch: 19 - step: 524 (total: 540) - loss: 0.000776 acc 1.00000 train epoch: 19 - step: 525 (total: 540) - loss: 0.000688 acc 1.00000 dev step: 525 - loss: 0.963112 accuracy: 0.83824, current best 0.86765 train epoch: 19 - step: 526 (total: 540) - loss: 0.001005 acc 1.00000 train epoch: 19 - step: 527 (total: 540) - loss: 0.000491 acc 1.00000 train epoch: 19 - step: 528 (total: 540) - loss: 0.000759 acc 1.00000 train epoch: 19 - step: 529 (total: 540) - loss: 0.000579 acc 1.00000 train epoch: 19 - step: 530 (total: 540) - loss: 0.000592 acc 1.00000 dev step: 530 - loss: 0.965140 accuracy: 0.83824, current best 0.86765 train epoch: 19 - step: 531 (total: 540) - loss: 0.000727 acc 1.00000 train epoch: 19 - step: 532 (total: 540) - loss: 0.000827 acc 1.00000 train epoch: 19 - step: 533 (total: 540) - loss: 0.002026 acc 1.00000 train epoch: 19 - step: 534 (total: 540) - loss: 0.001417 acc 1.00000 train epoch: 19 - step: 535 (total: 540) - loss: 0.000947 acc 1.00000 dev step: 535 - loss: 0.967908 accuracy: 0.83824, current best 0.86765 train epoch: 19 - step: 536 (total: 540) - loss: 0.000558 acc 1.00000 train epoch: 19 - step: 537 (total: 540) - loss: 0.000692 acc 1.00000 train epoch: 19 - step: 538 (total: 540) - loss: 0.001994 acc 1.00000 train epoch: 19 - step: 539 (total: 540) - loss: 0.000524 acc 1.00000 INFO 2021-04-10 19:56:40,966 launch.py:240] Local processes completed. end DuEE-Fin enum train
# 枚举分类预测
!bash run_duee_fin.sh enum_predict
check and create directory dir ./ckpt exist dir ./ckpt/DuEE-Fin exist dir ./submit exist start DuEE-Fin enum predict /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 19:56:50,581] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 19:56:50.583134 9015 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 19:56:50.588418 9015 device_context.cc:372] device: 0, cuDNN Version: 7.6. /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict. warnings.warn(("Skip loading for {}. ".format(key) + str(err))) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict. warnings.warn(("Skip loading for {}. ".format(key) + str(err))) /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card. warnings.warn("The program will return to single-card operation. " [2021-04-10 19:56:57,202] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt ============start predict========== Loaded parameters from ./ckpt/DuEE-Fin/enum/best.pdparams save data 140867 to ./ckpt/DuEE-Fin/enum/test_pred.json end DuEE-Fin enum predict
按照比赛预测指定格式提交结果至评测网站。 结果存放于submit/test_duee_fin.json
!bash run_duee_fin.sh pred_2_submit
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist
start DuEE-Fin predict data merge to submit fotmat
trigger predict 140867 load from ./ckpt/DuEE-Fin/trigger/test_pred.json
role predict 140867 load from ./ckpt/DuEE-Fin/role/test_pred.json
enum predict 140867 load from ./ckpt/DuEE-Fin/enum/test_pred.json
schema 13 load from ./conf/DuEE-Fin/event_schema.json
submit data 30000 save to ./submit/test_duee_fin.json
end DuEE-Fin role predict data merge
句子级别通用领域的事件抽取数据集(DuEE 1.0)上进行事件抽取的基线模型,该模型采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型和基于序列标注的论元抽取模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色。模型和数据处理方式与篇章级事件抽取相同,此处不再赘述。句子级别通用领域的事件抽取无枚举角色分类。
# 数据预处理
!bash run_duee_1.sh data_prepare
# 训练触发词识别模型
!bash run_duee_1.sh trigger_train
该条输出内容超过1000行,保存时将被截断 check and create directory dir ./ckpt exist create dir * ./ckpt/DuEE1.0 * dir ./submit exist start DuEE1.0 data prepare ===============DUEE 1.0 DATASET============== =================start schema process============== input path ./conf/DuEE1.0/event_schema.json save trigger tag 131 at ./conf/DuEE1.0/trigger_tag.dict save trigger tag 243 at ./conf/DuEE1.0/role_tag.dict =================end schema process=============== =================start schema process============== ----trigger------for dir ./data/DuEE1.0 to ./data/DuEE1.0/trigger train 11959 dev 1499 ----role------for dir ./data/DuEE1.0 to ./data/DuEE1.0/role train 13916 dev 1791 test 1 =================end schema process============== end DuEE1.0 data prepare check and create directory dir ./ckpt exist dir ./ckpt/DuEE1.0 exist dir ./submit exist start DuEE1.0 trigger train /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp ----------- Configuration Arguments ----------- gpus: 0 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: sequence_labeling.py training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/trigger_tag.dict', '--train_data', './data/DuEE1.0/trigger/train.tsv', '--dev_data', './data/DuEE1.0/trigger/dev.tsv', '--test_data', './data/DuEE1.0/trigger/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/trigger', '--init_ckpt', './ckpt/DuEE1.0/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/trigger/test_pred.json', '--device', 'gpu'] worker_num: None workers: ------------------------------------------------ WARNING 2021-04-10 20:12:04,884 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-04-10 20:12:04,886 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:44437 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:44437 | | FLAGS_selected_gpus 0 | +=======================================================================================+ INFO 2021-04-10 20:12:04,886 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 20:12:06,137] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 20:12:06,151] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 20:12:06.152766 9531 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 20:12:06.157284 9531 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start train========== train epoch: 0 - step: 10 (total: 14960) - loss: 0.399632 train epoch: 0 - step: 20 (total: 14960) - loss: 0.439437 train epoch: 0 - step: 30 (total: 14960) - loss: 0.408838 train epoch: 0 - step: 40 (total: 14960) - loss: 0.298826 train epoch: 0 - step: 50 (total: 14960) - loss: 0.394555 dev step: 50 - loss: 0.36327, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 60 (total: 14960) - loss: 0.485982 train epoch: 0 - step: 70 (total: 14960) - loss: 0.250205 train epoch: 0 - step: 80 (total: 14960) - loss: 0.382578 train epoch: 0 - step: 90 (total: 14960) - loss: 0.202613 train epoch: 0 - step: 100 (total: 14960) - loss: 0.309972 dev step: 100 - loss: 0.35608, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 110 (total: 14960) - loss: 0.310728 train epoch: 0 - step: 120 (total: 14960) - loss: 0.324738 train epoch: 0 - step: 130 (total: 14960) - loss: 0.262632 train epoch: 0 - step: 140 (total: 14960) - loss: 0.432903 train epoch: 0 - step: 150 (total: 14960) - loss: 0.436539 dev step: 150 - loss: 0.35624, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 160 (total: 14960) - loss: 0.485794 train epoch: 0 - step: 170 (total: 14960) - loss: 0.315029 train epoch: 0 - step: 180 (total: 14960) - loss: 0.284743 train epoch: 0 - step: 190 (total: 14960) - loss: 0.259944 train epoch: 0 - step: 200 (total: 14960) - loss: 0.311902 dev step: 200 - loss: 0.33042, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 210 (total: 14960) - loss: 0.330571 train epoch: 0 - step: 220 (total: 14960) - loss: 0.273139 train epoch: 0 - step: 230 (total: 14960) - loss: 0.378063 train epoch: 0 - step: 240 (total: 14960) - loss: 0.250299 train epoch: 0 - step: 250 (total: 14960) - loss: 0.290701 dev step: 250 - loss: 0.29563, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 260 (total: 14960) - loss: 0.202284 train epoch: 0 - step: 270 (total: 14960) - loss: 0.180812 train epoch: 0 - step: 280 (total: 14960) - loss: 0.238939 train epoch: 0 - step: 290 (total: 14960) - loss: 0.256409 train epoch: 0 - step: 300 (total: 14960) - loss: 0.192298 dev step: 300 - loss: 0.19781, precision: 0.28393, recall: 0.17765, f1: 0.21856 current best 0.00000 ==============================================save best model best performerence 0.218557 train epoch: 0 - step: 310 (total: 14960) - loss: 0.236116 train epoch: 0 - step: 320 (total: 14960) - loss: 0.185691 train epoch: 0 - step: 330 (total: 14960) - loss: 0.150023 train epoch: 0 - step: 340 (total: 14960) - loss: 0.160092 train epoch: 0 - step: 350 (total: 14960) - loss: 0.251915 dev step: 350 - loss: 0.16887, precision: 0.41444, recall: 0.23408, f1: 0.29918 current best 0.21856 ==============================================save best model best performerence 0.299179 train epoch: 0 - step: 360 (total: 14960) - loss: 0.226977 train epoch: 0 - step: 370 (total: 14960) - loss: 0.157772 train epoch: 0 - step: 380 (total: 14960) - loss: 0.204087 train epoch: 0 - step: 390 (total: 14960) - loss: 0.193559 train epoch: 0 - step: 400 (total: 14960) - loss: 0.076721 dev step: 400 - loss: 0.14077, precision: 0.40486, recall: 0.33520, f1: 0.36675 current best 0.29918 ==============================================save best model best performerence 0.366748 train epoch: 0 - step: 410 (total: 14960) - loss: 0.132487 train epoch: 0 - step: 420 (total: 14960) - loss: 0.234711 train epoch: 0 - step: 430 (total: 14960) - loss: 0.146011 train epoch: 0 - step: 440 (total: 14960) - loss: 0.182145 train epoch: 0 - step: 450 (total: 14960) - loss: 0.124297 dev step: 450 - loss: 0.11585, precision: 0.47749, recall: 0.47989, f1: 0.47868 current best 0.36675 ==============================================save best model best performerence 0.478685 train epoch: 0 - step: 460 (total: 14960) - loss: 0.128533 train epoch: 0 - step: 470 (total: 14960) - loss: 0.232507 train epoch: 0 - step: 480 (total: 14960) - loss: 0.138922 train epoch: 0 - step: 490 (total: 14960) - loss: 0.063667 train epoch: 0 - step: 500 (total: 14960) - loss: 0.067490 dev step: 500 - loss: 0.09702, precision: 0.55856, recall: 0.51955, f1: 0.53835 current best 0.47868 ==============================================save best model best performerence 0.538350 train epoch: 0 - step: 510 (total: 14960) - loss: 0.076103 train epoch: 0 - step: 520 (total: 14960) - loss: 0.057995 train epoch: 0 - step: 530 (total: 14960) - loss: 0.066106 train epoch: 0 - step: 540 (total: 14960) - loss: 0.122683 train epoch: 0 - step: 550 (total: 14960) - loss: 0.106140 dev step: 550 - loss: 0.08321, precision: 0.62119, recall: 0.59274, f1: 0.60663 current best 0.53835 ==============================================save best model best performerence 0.606632 train epoch: 0 - step: 560 (total: 14960) - loss: 0.039723 train epoch: 0 - step: 570 (total: 14960) - loss: 0.093354 train epoch: 0 - step: 580 (total: 14960) - loss: 0.125624 train epoch: 0 - step: 590 (total: 14960) - loss: 0.056028 train epoch: 0 - step: 600 (total: 14960) - loss: 0.050333 dev step: 600 - loss: 0.07346, precision: 0.67859, recall: 0.63575, f1: 0.65648 current best 0.60663 ==============================================save best model best performerence 0.656475 train epoch: 0 - step: 610 (total: 14960) - loss: 0.106334 train epoch: 0 - step: 620 (total: 14960) - loss: 0.106583 train epoch: 0 - step: 630 (total: 14960) - loss: 0.060192 train epoch: 0 - step: 640 (total: 14960) - loss: 0.032199 train epoch: 0 - step: 650 (total: 14960) - loss: 0.104459 dev step: 650 - loss: 0.06579, precision: 0.69209, recall: 0.68939, f1: 0.69074 current best 0.65648 ==============================================save best model best performerence 0.690736 train epoch: 0 - step: 660 (total: 14960) - loss: 0.068539 train epoch: 0 - step: 670 (total: 14960) - loss: 0.059690 train epoch: 0 - step: 680 (total: 14960) - loss: 0.064414 train epoch: 0 - step: 690 (total: 14960) - loss: 0.085624 train epoch: 0 - step: 700 (total: 14960) - loss: 0.064715 dev step: 700 - loss: 0.06439, precision: 0.68861, recall: 0.69553, f1: 0.69205 current best 0.69074 ==============================================save best model best performerence 0.692051 train epoch: 0 - step: 710 (total: 14960) - loss: 0.071924 train epoch: 0 - step: 720 (total: 14960) - loss: 0.064167 train epoch: 0 - step: 730 (total: 14960) - loss: 0.053353 train epoch: 0 - step: 740 (total: 14960) - loss: 0.084605 train epoch: 1 - step: 750 (total: 14960) - loss: 0.071954 dev step: 750 - loss: 0.05509, precision: 0.71468, recall: 0.73184, f1: 0.72316 current best 0.69205 ==============================================save best model best performerence 0.723158 train epoch: 1 - step: 760 (total: 14960) - loss: 0.063369 train epoch: 1 - step: 770 (total: 14960) - loss: 0.010517 train epoch: 1 - step: 780 (total: 14960) - loss: 0.053650 train epoch: 1 - step: 790 (total: 14960) - loss: 0.042259 train epoch: 1 - step: 800 (total: 14960) - loss: 0.032458 dev step: 800 - loss: 0.05442, precision: 0.70917, recall: 0.77374, f1: 0.74005 current best 0.72316 ==============================================save best model best performerence 0.740048 train epoch: 1 - step: 810 (total: 14960) - loss: 0.056759 train epoch: 1 - step: 820 (total: 14960) - loss: 0.027823 train epoch: 1 - step: 830 (total: 14960) - loss: 0.047783 train epoch: 1 - step: 840 (total: 14960) - loss: 0.038662 train epoch: 1 - step: 850 (total: 14960) - loss: 0.085002 dev step: 850 - loss: 0.05003, precision: 0.72125, recall: 0.80223, f1: 0.75959 current best 0.74005 ==============================================save best model best performerence 0.759587 train epoch: 1 - step: 860 (total: 14960) - loss: 0.022502 train epoch: 1 - step: 870 (total: 14960) - loss: 0.039028 train epoch: 1 - step: 880 (total: 14960) - loss: 0.042963 train epoch: 1 - step: 890 (total: 14960) - loss: 0.045788 train epoch: 1 - step: 900 (total: 14960) - loss: 0.026486 dev step: 900 - loss: 0.04721, precision: 0.74372, recall: 0.84302, f1: 0.79026 current best 0.75959 ==============================================save best model best performerence 0.790259 train epoch: 1 - step: 910 (total: 14960) - loss: 0.032655 train epoch: 1 - step: 920 (total: 14960) - loss: 0.021889 train epoch: 1 - step: 930 (total: 14960) - loss: 0.033798 train epoch: 1 - step: 940 (total: 14960) - loss: 0.060657 train epoch: 1 - step: 950 (total: 14960) - loss: 0.019720 dev step: 950 - loss: 0.04749, precision: 0.73062, recall: 0.84246, f1: 0.78256 current best 0.79026 train epoch: 1 - step: 960 (total: 14960) - loss: 0.037086 train epoch: 1 - step: 970 (total: 14960) - loss: 0.027883 train epoch: 1 - step: 980 (total: 14960) - loss: 0.044426 train epoch: 1 - step: 990 (total: 14960) - loss: 0.021761 train epoch: 1 - step: 1000 (total: 14960) - loss: 0.044189 dev step: 1000 - loss: 0.04534, precision: 0.79933, recall: 0.80335, f1: 0.80134 current best 0.79026 ==============================================save best model best performerence 0.801337 train epoch: 1 - step: 1010 (total: 14960) - loss: 0.050067 train epoch: 1 - step: 1020 (total: 14960) - loss: 0.033646 train epoch: 1 - step: 1030 (total: 14960) - loss: 0.030856 train epoch: 1 - step: 1040 (total: 14960) - loss: 0.045213 train epoch: 1 - step: 1050 (total: 14960) - loss: 0.068307 dev step: 1050 - loss: 0.04333, precision: 0.79307, recall: 0.81788, f1: 0.80528 current best 0.80134 ==============================================save best model best performerence 0.805281 train epoch: 1 - step: 1060 (total: 14960) - loss: 0.031629 train epoch: 1 - step: 1070 (total: 14960) - loss: 0.034574 train epoch: 1 - step: 1080 (total: 14960) - loss: 0.009664 train epoch: 1 - step: 1090 (total: 14960) - loss: 0.022344 train epoch: 1 - step: 1100 (total: 14960) - loss: 0.030906 dev step: 1100 - loss: 0.04319, precision: 0.77368, recall: 0.84413, f1: 0.80737 current best 0.80528 ==============================================save best model best performerence 0.807374 train epoch: 1 - step: 1110 (total: 14960) - loss: 0.021814 train epoch: 1 - step: 1120 (total: 14960) - loss: 0.015393 train epoch: 1 - step: 1130 (total: 14960) - loss: 0.018273 train epoch: 1 - step: 1140 (total: 14960) - loss: 0.012760 train epoch: 1 - step: 1150 (total: 14960) - loss: 0.047260 dev step: 1150 - loss: 0.04239, precision: 0.79338, recall: 0.83017, f1: 0.81136 current best 0.80737 ==============================================save best model best performerence 0.811357 train epoch: 1 - step: 1160 (total: 14960) - loss: 0.055832 train epoch: 1 - step: 1170 (total: 14960) - loss: 0.023067 train epoch: 1 - step: 1180 (total: 14960) - loss: 0.029046 train epoch: 1 - step: 1190 (total: 14960) - loss: 0.022165 train epoch: 1 - step: 1200 (total: 14960) - loss: 0.021577 dev step: 1200 - loss: 0.04173, precision: 0.79144, recall: 0.82682, f1: 0.80874 current best 0.81136 train epoch: 1 - step: 1210 (total: 14960) - loss: 0.040631 train epoch: 1 - step: 1220 (total: 14960) - loss: 0.028234 train epoch: 1 - step: 1230 (total: 14960) - loss: 0.033360 train epoch: 1 - step: 1240 (total: 14960) - loss: 0.023661 train epoch: 1 - step: 1250 (total: 14960) - loss: 0.051824 dev step: 1250 - loss: 0.04070, precision: 0.77673, recall: 0.83184, f1: 0.80335 current best 0.81136 train epoch: 1 - step: 1260 (total: 14960) - loss: 0.027152 train epoch: 1 - step: 1270 (total: 14960) - loss: 0.027165 train epoch: 1 - step: 1280 (total: 14960) - loss: 0.035664 train epoch: 1 - step: 1290 (total: 14960) - loss: 0.038181 train epoch: 1 - step: 1300 (total: 14960) - loss: 0.034335 dev step: 1300 - loss: 0.03963, precision: 0.77882, recall: 0.83799, f1: 0.80732 current best 0.81136 train epoch: 1 - step: 1310 (total: 14960) - loss: 0.045533 train epoch: 1 - step: 1320 (total: 14960) - loss: 0.076441 train epoch: 1 - step: 1330 (total: 14960) - loss: 0.035492 train epoch: 1 - step: 1340 (total: 14960) - loss: 0.020915 train epoch: 1 - step: 1350 (total: 14960) - loss: 0.009881 dev step: 1350 - loss: 0.04082, precision: 0.78723, recall: 0.84749, f1: 0.81625 current best 0.81136 ==============================================save best model best performerence 0.816250 train epoch: 1 - step: 1360 (total: 14960) - loss: 0.037463 train epoch: 1 - step: 1370 (total: 14960) - loss: 0.044000 train epoch: 1 - step: 1380 (total: 14960) - loss: 0.033455 train epoch: 1 - step: 1390 (total: 14960) - loss: 0.011349 train epoch: 1 - step: 1400 (total: 14960) - loss: 0.027764 dev step: 1400 - loss: 0.04117, precision: 0.79249, recall: 0.82570, f1: 0.80876 current best 0.81625 train epoch: 1 - step: 1410 (total: 14960) - loss: 0.032213 train epoch: 1 - step: 1420 (total: 14960) - loss: 0.024112 train epoch: 1 - step: 1430 (total: 14960) - loss: 0.025826 train epoch: 1 - step: 1440 (total: 14960) - loss: 0.039797 train epoch: 1 - step: 1450 (total: 14960) - loss: 0.073417 dev step: 1450 - loss: 0.03987, precision: 0.78395, recall: 0.85140, f1: 0.81628 current best 0.81625 ==============================================save best model best performerence 0.816283 train epoch: 1 - step: 1460 (total: 14960) - loss: 0.021326 train epoch: 1 - step: 1470 (total: 14960) - loss: 0.018628 train epoch: 1 - step: 1480 (total: 14960) - loss: 0.029017 train epoch: 1 - step: 1490 (total: 14960) - loss: 0.048521 …… train epoch: 19 - step: 14220 (total: 14960) - loss: 0.001144 train epoch: 19 - step: 14230 (total: 14960) - loss: 0.000301 train epoch: 19 - step: 14240 (total: 14960) - loss: 0.001033 train epoch: 19 - step: 14250 (total: 14960) - loss: 0.003649 dev step: 14250 - loss: 0.07217, precision: 0.83424, recall: 0.85754, f1: 0.84573 current best 0.85026 train epoch: 19 - step: 14260 (total: 14960) - loss: 0.000222 train epoch: 19 - step: 14270 (total: 14960) - loss: 0.001345 train epoch: 19 - step: 14280 (total: 14960) - loss: 0.000353 train epoch: 19 - step: 14290 (total: 14960) - loss: 0.004071 train epoch: 19 - step: 14300 (total: 14960) - loss: 0.004355 dev step: 14300 - loss: 0.07171, precision: 0.83568, recall: 0.86089, f1: 0.84810 current best 0.85026 train epoch: 19 - step: 14310 (total: 14960) - loss: 0.001791 train epoch: 19 - step: 14320 (total: 14960) - loss: 0.001619 train epoch: 19 - step: 14330 (total: 14960) - loss: 0.003730 train epoch: 19 - step: 14340 (total: 14960) - loss: 0.000157 train epoch: 19 - step: 14350 (total: 14960) - loss: 0.000462 dev step: 14350 - loss: 0.07241, precision: 0.83370, recall: 0.85698, f1: 0.84518 current best 0.85026 train epoch: 19 - step: 14360 (total: 14960) - loss: 0.000490 train epoch: 19 - step: 14370 (total: 14960) - loss: 0.000182 train epoch: 19 - step: 14380 (total: 14960) - loss: 0.002310 train epoch: 19 - step: 14390 (total: 14960) - loss: 0.000973 train epoch: 19 - step: 14400 (total: 14960) - loss: 0.000543 dev step: 14400 - loss: 0.07378, precision: 0.83623, recall: 0.86145, f1: 0.84865 current best 0.85026 train epoch: 19 - step: 14410 (total: 14960) - loss: 0.000710 train epoch: 19 - step: 14420 (total: 14960) - loss: 0.000122 train epoch: 19 - step: 14430 (total: 14960) - loss: 0.003291 train epoch: 19 - step: 14440 (total: 14960) - loss: 0.001306 train epoch: 19 - step: 14450 (total: 14960) - loss: 0.002820 dev step: 14450 - loss: 0.07792, precision: 0.83982, recall: 0.85531, f1: 0.84750 current best 0.85026 train epoch: 19 - step: 14460 (total: 14960) - loss: 0.000153 train epoch: 19 - step: 14470 (total: 14960) - loss: 0.009174 train epoch: 19 - step: 14480 (total: 14960) - loss: 0.002065 train epoch: 19 - step: 14490 (total: 14960) - loss: 0.001641 train epoch: 19 - step: 14500 (total: 14960) - loss: 0.013356 dev step: 14500 - loss: 0.07694, precision: 0.81122, recall: 0.86425, f1: 0.83689 current best 0.85026 train epoch: 19 - step: 14510 (total: 14960) - loss: 0.000902 train epoch: 19 - step: 14520 (total: 14960) - loss: 0.009084 train epoch: 19 - step: 14530 (total: 14960) - loss: 0.000777 train epoch: 19 - step: 14540 (total: 14960) - loss: 0.000141 train epoch: 19 - step: 14550 (total: 14960) - loss: 0.001748 dev step: 14550 - loss: 0.07448, precision: 0.81751, recall: 0.86592, f1: 0.84102 current best 0.85026 train epoch: 19 - step: 14560 (total: 14960) - loss: 0.000747 train epoch: 19 - step: 14570 (total: 14960) - loss: 0.012806 train epoch: 19 - step: 14580 (total: 14960) - loss: 0.004823 train epoch: 19 - step: 14590 (total: 14960) - loss: 0.001402 train epoch: 19 - step: 14600 (total: 14960) - loss: 0.012385 dev step: 14600 - loss: 0.07167, precision: 0.82297, recall: 0.87263, f1: 0.84707 current best 0.85026 train epoch: 19 - step: 14610 (total: 14960) - loss: 0.003738 train epoch: 19 - step: 14620 (total: 14960) - loss: 0.000189 train epoch: 19 - step: 14630 (total: 14960) - loss: 0.004993 train epoch: 19 - step: 14640 (total: 14960) - loss: 0.000982 train epoch: 19 - step: 14650 (total: 14960) - loss: 0.000245 dev step: 14650 - loss: 0.07632, precision: 0.83324, recall: 0.85978, f1: 0.84630 current best 0.85026 train epoch: 19 - step: 14660 (total: 14960) - loss: 0.002108 train epoch: 19 - step: 14670 (total: 14960) - loss: 0.002859 train epoch: 19 - step: 14680 (total: 14960) - loss: 0.000802 train epoch: 19 - step: 14690 (total: 14960) - loss: 0.001411 train epoch: 19 - step: 14700 (total: 14960) - loss: 0.000175 dev step: 14700 - loss: 0.07886, precision: 0.81485, recall: 0.87039, f1: 0.84171 current best 0.85026 train epoch: 19 - step: 14710 (total: 14960) - loss: 0.000079 train epoch: 19 - step: 14720 (total: 14960) - loss: 0.000239 train epoch: 19 - step: 14730 (total: 14960) - loss: 0.002459 train epoch: 19 - step: 14740 (total: 14960) - loss: 0.000840 train epoch: 19 - step: 14750 (total: 14960) - loss: 0.000168 dev step: 14750 - loss: 0.07765, precision: 0.82555, recall: 0.85922, f1: 0.84205 current best 0.85026 train epoch: 19 - step: 14760 (total: 14960) - loss: 0.000097 train epoch: 19 - step: 14770 (total: 14960) - loss: 0.000967 train epoch: 19 - step: 14780 (total: 14960) - loss: 0.000198 train epoch: 19 - step: 14790 (total: 14960) - loss: 0.000484 train epoch: 19 - step: 14800 (total: 14960) - loss: 0.002144 dev step: 14800 - loss: 0.07190, precision: 0.82507, recall: 0.86425, f1: 0.84420 current best 0.85026 train epoch: 19 - step: 14810 (total: 14960) - loss: 0.000452 train epoch: 19 - step: 14820 (total: 14960) - loss: 0.000663 train epoch: 19 - step: 14830 (total: 14960) - loss: 0.022780 train epoch: 19 - step: 14840 (total: 14960) - loss: 0.007530 train epoch: 19 - step: 14850 (total: 14960) - loss: 0.000360 dev step: 14850 - loss: 0.07089, precision: 0.83607, recall: 0.85475, f1: 0.84530 current best 0.85026 train epoch: 19 - step: 14860 (total: 14960) - loss: 0.002914 train epoch: 19 - step: 14870 (total: 14960) - loss: 0.000343 train epoch: 19 - step: 14880 (total: 14960) - loss: 0.001293 train epoch: 19 - step: 14890 (total: 14960) - loss: 0.000621 train epoch: 19 - step: 14900 (total: 14960) - loss: 0.001378 dev step: 14900 - loss: 0.06631, precision: 0.82437, recall: 0.87318, f1: 0.84807 current best 0.85026 train epoch: 19 - step: 14910 (total: 14960) - loss: 0.000467 train epoch: 19 - step: 14920 (total: 14960) - loss: 0.001079 train epoch: 19 - step: 14930 (total: 14960) - loss: 0.002540 train epoch: 19 - step: 14940 (total: 14960) - loss: 0.006217 train epoch: 19 - step: 14950 (total: 14960) - loss: 0.000213 dev step: 14950 - loss: 0.07010, precision: 0.83477, recall: 0.86369, f1: 0.84898 current best 0.85026 INFO 2021-04-10 21:18:14,794 launch.py:240] Local processes completed. end DuEE1.0 trigger train
# 触发词识别预测
!bash run_duee_1.sh trigger_predict
check and create directory dir ./ckpt exist dir ./ckpt/DuEE1.0 exist dir ./submit exist start DuEE1.0 trigger predict /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 21:19:00,925] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 21:19:00,939] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 21:19:00.940081 12545 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 21:19:00.944607 12545 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start predict========== Loaded parameters from ./ckpt/DuEE1.0/trigger/best.pdparams save data 499 to ./ckpt/DuEE1.0/trigger/test_pred.json end DuEE1.0 trigger predict
# 论元识别模型训练
!bash run_duee_1.sh role_train
该条输出内容超过1000行,保存时将被截断 check and create directory dir ./ckpt exist dir ./ckpt/DuEE1.0 exist dir ./submit exist start DuEE1.0 role train /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp ----------- Configuration Arguments ----------- gpus: 0 heter_worker_num: None heter_workers: http_port: None ips: 127.0.0.1 log_dir: log nproc_per_node: None server_num: None servers: training_script: sequence_labeling.py training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/role_tag.dict', '--train_data', './data/DuEE1.0/role/train.tsv', '--dev_data', './data/DuEE1.0/role/dev.tsv', '--test_data', './data/DuEE1.0/role/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/role', '--init_ckpt', './ckpt/DuEE1.0/role/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/role/test_pred.json', '--device', 'gpu'] worker_num: None workers: ------------------------------------------------ WARNING 2021-04-10 21:19:31,729 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode launch train in GPU mode INFO 2021-04-10 21:19:31,731 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:39979 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:39979 | | FLAGS_selected_gpus 0 | +=======================================================================================+ INFO 2021-04-10 21:19:31,731 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 21:19:33,027] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 21:19:33,041] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 21:19:33.042527 12581 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 21:19:33.047051 12581 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start train========== train epoch: 0 - step: 10 (total: 17400) - loss: 1.631316 train epoch: 0 - step: 20 (total: 17400) - loss: 1.261623 train epoch: 0 - step: 30 (total: 17400) - loss: 1.499143 train epoch: 0 - step: 40 (total: 17400) - loss: 1.374749 train epoch: 0 - step: 50 (total: 17400) - loss: 2.372678 dev step: 50 - loss: 1.41820, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000 train epoch: 0 - step: 60 (total: 17400) - loss: 1.314709 train epoch: 0 - step: 70 (total: 17400) - loss: 1.248975 train epoch: 0 - step: 80 (total: 17400) - loss: 1.268549 train epoch: 0 - step: 90 (total: 17400) - loss: 1.528821 train epoch: 0 - step: 100 (total: 17400) - loss: 1.331270 dev step: 100 - loss: 1.19630, precision: 0.00465, recall: 0.00027, f1: 0.00051 current best 0.00000 ==============================================save best model best performerence 0.000513 train epoch: 0 - step: 110 (total: 17400) - loss: 1.180573 train epoch: 0 - step: 120 (total: 17400) - loss: 1.466052 train epoch: 0 - step: 130 (total: 17400) - loss: 1.448678 train epoch: 0 - step: 140 (total: 17400) - loss: 1.050897 train epoch: 0 - step: 150 (total: 17400) - loss: 1.163228 dev step: 150 - loss: 1.00658, precision: 0.15019, recall: 0.07632, f1: 0.10121 current best 0.00051 ==============================================save best model best performerence 0.101207 train epoch: 0 - step: 160 (total: 17400) - loss: 1.152660 train epoch: 0 - step: 170 (total: 17400) - loss: 0.800648 train epoch: 0 - step: 180 (total: 17400) - loss: 0.863754 train epoch: 0 - step: 190 (total: 17400) - loss: 1.399399 train epoch: 0 - step: 200 (total: 17400) - loss: 0.933540 dev step: 200 - loss: 0.87702, precision: 0.29212, recall: 0.16812, f1: 0.21341 current best 0.10121 ==============================================save best model best performerence 0.213411 train epoch: 0 - step: 210 (total: 17400) - loss: 0.622572 train epoch: 0 - step: 220 (total: 17400) - loss: 0.693645 train epoch: 0 - step: 230 (total: 17400) - loss: 0.456951 train epoch: 0 - step: 240 (total: 17400) - loss: 0.852158 train epoch: 0 - step: 250 (total: 17400) - loss: 0.692744 dev step: 250 - loss: 0.74932, precision: 0.19567, recall: 0.19880, f1: 0.19722 current best 0.21341 train epoch: 0 - step: 260 (total: 17400) - loss: 0.705850 train epoch: 0 - step: 270 (total: 17400) - loss: 0.601921 train epoch: 0 - step: 280 (total: 17400) - loss: 0.790073 train epoch: 0 - step: 290 (total: 17400) - loss: 0.576146 train epoch: 0 - step: 300 (total: 17400) - loss: 0.896055 dev step: 300 - loss: 0.68762, precision: 0.26187, recall: 0.26806, f1: 0.26493 current best 0.21341 ==============================================save best model best performerence 0.264931 train epoch: 0 - step: 310 (total: 17400) - loss: 0.550488 train epoch: 0 - step: 320 (total: 17400) - loss: 0.755333 train epoch: 0 - step: 330 (total: 17400) - loss: 0.608667 train epoch: 0 - step: 340 (total: 17400) - loss: 0.735348 train epoch: 0 - step: 350 (total: 17400) - loss: 0.608221 dev step: 350 - loss: 0.58279, precision: 0.30612, recall: 0.30554, f1: 0.30583 current best 0.26493 ==============================================save best model best performerence 0.305831 train epoch: 0 - step: 360 (total: 17400) - loss: 0.547571 train epoch: 0 - step: 370 (total: 17400) - loss: 0.651604 train epoch: 0 - step: 380 (total: 17400) - loss: 0.356159 train epoch: 0 - step: 390 (total: 17400) - loss: 0.471009 train epoch: 0 - step: 400 (total: 17400) - loss: 0.464584 dev step: 400 - loss: 0.54754, precision: 0.28086, recall: 0.28055, f1: 0.28071 current best 0.30583 train epoch: 0 - step: 410 (total: 17400) - loss: 0.626027 train epoch: 0 - step: 420 (total: 17400) - loss: 0.362687 train epoch: 0 - step: 430 (total: 17400) - loss: 0.477045 train epoch: 0 - step: 440 (total: 17400) - loss: 0.504392 train epoch: 0 - step: 450 (total: 17400) - loss: 0.452660 dev step: 450 - loss: 0.51604, precision: 0.31064, recall: 0.29495, f1: 0.30259 current best 0.30583 train epoch: 0 - step: 460 (total: 17400) - loss: 0.315736 train epoch: 0 - step: 470 (total: 17400) - loss: 0.695824 train epoch: 0 - step: 480 (total: 17400) - loss: 0.668844 train epoch: 0 - step: 490 (total: 17400) - loss: 0.485630 train epoch: 0 - step: 500 (total: 17400) - loss: 0.553830 dev step: 500 - loss: 0.48320, precision: 0.35442, recall: 0.33623, f1: 0.34509 current best 0.30583 ==============================================save best model best performerence 0.345087 train epoch: 0 - step: 510 (total: 17400) - loss: 0.630377 train epoch: 0 - step: 520 (total: 17400) - loss: 0.870098 train epoch: 0 - step: 530 (total: 17400) - loss: 0.525724 train epoch: 0 - step: 540 (total: 17400) - loss: 0.456801 train epoch: 0 - step: 550 (total: 17400) - loss: 0.336868 dev step: 550 - loss: 0.46332, precision: 0.34337, recall: 0.40576, f1: 0.37197 current best 0.34509 ==============================================save best model best performerence 0.371966 train epoch: 0 - step: 560 (total: 17400) - loss: 0.549533 train epoch: 0 - step: 570 (total: 17400) - loss: 0.589998 train epoch: 0 - step: 580 (total: 17400) - loss: 0.466343 train epoch: 0 - step: 590 (total: 17400) - loss: 0.757425 train epoch: 0 - step: 600 (total: 17400) - loss: 0.476054 dev step: 600 - loss: 0.46148, precision: 0.32498, recall: 0.40494, f1: 0.36058 current best 0.37197 train epoch: 0 - step: 610 (total: 17400) - loss: 0.867860 train epoch: 0 - step: 620 (total: 17400) - loss: 0.423540 train epoch: 0 - step: 630 (total: 17400) - loss: 0.584098 train epoch: 0 - step: 640 (total: 17400) - loss: 0.333824 train epoch: 0 - step: 650 (total: 17400) - loss: 0.506903 dev step: 650 - loss: 0.41693, precision: 0.35424, recall: 0.40657, f1: 0.37860 current best 0.37197 ==============================================save best model best performerence 0.378604 train epoch: 0 - step: 660 (total: 17400) - loss: 0.349384 train epoch: 0 - step: 670 (total: 17400) - loss: 0.551703 train epoch: 0 - step: 680 (total: 17400) - loss: 0.407071 train epoch: 0 - step: 690 (total: 17400) - loss: 0.340015 train epoch: 0 - step: 700 (total: 17400) - loss: 0.514608 dev step: 700 - loss: 0.39935, precision: 0.36408, recall: 0.44704, f1: 0.40132 current best 0.37860 ==============================================save best model best performerence 0.401317 train epoch: 0 - step: 710 (total: 17400) - loss: 0.391622 train epoch: 0 - step: 720 (total: 17400) - loss: 0.411886 train epoch: 0 - step: 730 (total: 17400) - loss: 0.396601 train epoch: 0 - step: 740 (total: 17400) - loss: 0.408536 train epoch: 0 - step: 750 (total: 17400) - loss: 0.490862 dev step: 750 - loss: 0.38335, precision: 0.38105, recall: 0.40196, f1: 0.39122 current best 0.40132 train epoch: 0 - step: 760 (total: 17400) - loss: 0.589839 train epoch: 0 - step: 770 (total: 17400) - loss: 0.495729 train epoch: 0 - step: 780 (total: 17400) - loss: 0.292985 train epoch: 0 - step: 790 (total: 17400) - loss: 0.288670 train epoch: 0 - step: 800 (total: 17400) - loss: 0.591148 dev step: 800 - loss: 0.37288, precision: 0.37370, recall: 0.43998, f1: 0.40414 current best 0.40132 ==============================================save best model best performerence 0.404141 train epoch: 0 - step: 810 (total: 17400) - loss: 0.323106 train epoch: 0 - step: 820 (total: 17400) - loss: 0.374065 train epoch: 0 - step: 830 (total: 17400) - loss: 0.303335 train epoch: 0 - step: 840 (total: 17400) - loss: 0.362465 train epoch: 0 - step: 850 (total: 17400) - loss: 0.270363 dev step: 850 - loss: 0.34926, precision: 0.40548, recall: 0.39788, f1: 0.40164 current best 0.40414 train epoch: 0 - step: 860 (total: 17400) - loss: 0.640344 train epoch: 1 - step: 870 (total: 17400) - loss: 0.317456 train epoch: 1 - step: 880 (total: 17400) - loss: 0.338208 train epoch: 1 - step: 890 (total: 17400) - loss: 0.270992 train epoch: 1 - step: 900 (total: 17400) - loss: 0.262994 dev step: 900 - loss: 0.35676, precision: 0.36447, recall: 0.50978, f1: 0.42505 current best 0.40414 ==============================================save best model best performerence 0.425045 train epoch: 1 - step: 910 (total: 17400) - loss: 0.187394 train epoch: 1 - step: 920 (total: 17400) - loss: 0.319919 train epoch: 1 - step: 930 (total: 17400) - loss: 0.364867 train epoch: 1 - step: 940 (total: 17400) - loss: 0.167465 train epoch: 1 - step: 950 (total: 17400) - loss: 0.378459 dev step: 950 - loss: 0.33845, precision: 0.40183, recall: 0.50027, f1: 0.44568 current best 0.42505 ==============================================save best model best performerence 0.445681 train epoch: 1 - step: 960 (total: 17400) - loss: 0.505818 train epoch: 1 - step: 970 (total: 17400) - loss: 0.318232 train epoch: 1 - step: 980 (total: 17400) - loss: 0.354184 train epoch: 1 - step: 990 (total: 17400) - loss: 0.473859 train epoch: 1 - step: 1000 (total: 17400) - loss: 0.268665 dev step: 1000 - loss: 0.34670, precision: 0.41990, recall: 0.50543, f1: 0.45871 current best 0.44568 ==============================================save best model best performerence 0.458713 train epoch: 1 - step: 1010 (total: 17400) - loss: 0.457268 train epoch: 1 - step: 1020 (total: 17400) - loss: 0.279792 train epoch: 1 - step: 1030 (total: 17400) - loss: 0.311157 train epoch: 1 - step: 1040 (total: 17400) - loss: 0.266172 train epoch: 1 - step: 1050 (total: 17400) - loss: 0.348649 dev step: 1050 - loss: 0.33027, precision: 0.43967, recall: 0.50570, f1: 0.47038 current best 0.45871 ==============================================save best model best performerence 0.470380 train epoch: 1 - step: 1060 (total: 17400) - loss: 0.250878 train epoch: 1 - step: 1070 (total: 17400) - loss: 0.255359 train epoch: 1 - step: 1080 (total: 17400) - loss: 0.244313 train epoch: 1 - step: 1090 (total: 17400) - loss: 0.394027 train epoch: 1 - step: 1100 (total: 17400) - loss: 0.345162 dev step: 1100 - loss: 0.31890, precision: 0.40973, recall: 0.53992, f1: 0.46590 current best 0.47038 train epoch: 1 - step: 1110 (total: 17400) - loss: 0.351362 train epoch: 1 - step: 1120 (total: 17400) - loss: 0.505625 train epoch: 1 - step: 1130 (total: 17400) - loss: 0.254914 train epoch: 1 - step: 1140 (total: 17400) - loss: 0.299322 train epoch: 1 - step: 1150 (total: 17400) - loss: 0.230382 dev step: 1150 - loss: 0.33202, precision: 0.39473, recall: 0.57387, f1: 0.46774 current best 0.47038 train epoch: 1 - step: 1160 (total: 17400) - loss: 0.531530 train epoch: 1 - step: 1170 (total: 17400) - loss: 0.327992 train epoch: 1 - step: 1180 (total: 17400) - loss: 0.261732 train epoch: 1 - step: 1190 (total: 17400) - loss: 0.416111 train epoch: 1 - step: 1200 (total: 17400) - loss: 0.587504 dev step: 1200 - loss: 0.31763, precision: 0.42678, recall: 0.57224, f1: 0.48892 current best 0.47038 ==============================================save best model best performerence 0.488920 train epoch: 1 - step: 1210 (total: 17400) - loss: 0.318957 train epoch: 1 - step: 1220 (total: 17400) - loss: 0.240229 train epoch: 1 - step: 1230 (total: 17400) - loss: 0.268677 train epoch: 1 - step: 1240 (total: 17400) - loss: 0.306026 train epoch: 1 - step: 1250 (total: 17400) - loss: 0.207791 dev step: 1250 - loss: 0.32002, precision: 0.44161, recall: 0.53205, f1: 0.48263 current best 0.48892 train epoch: 1 - step: 1260 (total: 17400) - loss: 0.328496 train epoch: 1 - step: 1270 (total: 17400) - loss: 0.169225 train epoch: 1 - step: 1280 (total: 17400) - loss: 0.154055 train epoch: 1 - step: 1290 (total: 17400) - loss: 0.245896 train epoch: 1 - step: 1300 (total: 17400) - loss: 0.307641 dev step: 1300 - loss: 0.31898, precision: 0.43654, recall: 0.56328, f1: 0.49188 current best 0.48892 ==============================================save best model best performerence 0.491877 train epoch: 1 - step: 1310 (total: 17400) - loss: 0.333137 train epoch: 1 - step: 1320 (total: 17400) - loss: 0.245721 train epoch: 1 - step: 1330 (total: 17400) - loss: 0.284762 train epoch: 1 - step: 1340 (total: 17400) - loss: 0.454689 train epoch: 1 - step: 1350 (total: 17400) - loss: 0.181988 dev step: 1350 - loss: 0.31523, precision: 0.43998, recall: 0.58039, f1: 0.50053 current best 0.49188 ==============================================save best model best performerence 0.500527 train epoch: 1 - step: 1360 (total: 17400) - loss: 0.207600 train epoch: 1 - step: 1370 (total: 17400) - loss: 0.521199 train epoch: 1 - step: 1380 (total: 17400) - loss: 0.212064 train epoch: 1 - step: 1390 (total: 17400) - loss: 0.304855 train epoch: 1 - step: 1400 (total: 17400) - loss: 0.364982 dev step: 1400 - loss: 0.32255, precision: 0.45131, recall: 0.57523, f1: 0.50579 current best 0.50053 ==============================================save best model best performerence 0.505791 train epoch: 1 - step: 1410 (total: 17400) - loss: 0.282940 train epoch: 1 - step: 1420 (total: 17400) - loss: 0.247372 train epoch: 1 - step: 1430 (total: 17400) - loss: 0.204306 train epoch: 1 - step: 1440 (total: 17400) - loss: 0.197937 train epoch: 1 - step: 1450 (total: 17400) - loss: 0.248342 dev step: 1450 - loss: 0.31655, precision: 0.43383, recall: 0.57605, f1: 0.49492 current best 0.50579 train epoch: 1 - step: 1460 (total: 17400) - loss: 0.303543 train epoch: 1 - step: 1470 (total: 17400) - loss: 0.228280 train epoch: 1 - step: 1480 (total: 17400) - loss: 0.272400 train epoch: 1 - step: 1490 (total: 17400) - loss: 0.295671 train epoch: 1 - step: 1500 (total: 17400) - loss: 0.238553 dev step: 1500 - loss: 0.29889, precision: 0.45878, recall: 0.50027, f1: 0.47863 current best 0.50579 train epoch: 1 - step: 1510 (total: 17400) - loss: 0.340570 train epoch: 1 - step: 1520 (total: 17400) - loss: 0.178270 train epoch: 1 - step: 1530 (total: 17400) - loss: 0.304790 train epoch: 1 - step: 1540 (total: 17400) - loss: 0.289224 train epoch: 1 - step: 1550 (total: 17400) - loss: 0.371867 dev step: 1550 - loss: 0.30130, precision: 0.45212, recall: 0.61162, f1: 0.51991 current best 0.50579 ==============================================save best model best performerence 0.519912 train epoch: 1 - step: 1560 (total: 17400) - loss: 0.240305 train epoch: 1 - step: 1570 (total: 17400) - loss: 0.316205 train epoch: 1 - step: 1580 (total: 17400) - loss: 0.311467 train epoch: 1 - step: 1590 (total: 17400) - loss: 0.270995 train epoch: 1 - step: 1600 (total: 17400) - loss: 0.184202 dev step: 1600 - loss: 0.29522, precision: 0.43972, recall: 0.59234, f1: 0.50474 current best 0.51991 train epoch: 1 - step: 1610 (total: 17400) - loss: 0.431742 train epoch: 1 - step: 1620 (total: 17400) - loss: 0.234169 train epoch: 1 - step: 1630 (total: 17400) - loss: 0.247429 train epoch: 1 - step: 1640 (total: 17400) - loss: 0.355582 train epoch: 1 - step: 1650 (total: 17400) - loss: 0.281345 dev step: 1650 - loss: 0.29843, precision: 0.46141, recall: 0.58446, f1: 0.51570 current best 0.51991 train epoch: 1 - step: 1660 (total: 17400) - loss: 0.201275 train epoch: 1 - step: 1670 (total: 17400) - loss: 0.304434 train epoch: 1 - step: 1680 (total: 17400) - loss: 0.330689 train epoch: 1 - step: 1690 (total: 17400) - loss: 0.277704 train epoch: 1 - step: 1700 (total: 17400) - loss: 0.196703 dev step: 1700 - loss: 0.28736, precision: 0.46048, recall: 0.59017, f1: 0.51732 current best 0.51991 train epoch: 1 - step: 1710 (total: 17400) - loss: 0.253590 train epoch: 1 - step: 1720 (total: 17400) - loss: 0.238998 train epoch: 1 - step: 1730 (total: 17400) - loss: 0.267489 …… train epoch: 19 - step: 16530 (total: 17400) - loss: 0.090804 train epoch: 19 - step: 16540 (total: 17400) - loss: 0.172505 train epoch: 19 - step: 16550 (total: 17400) - loss: 0.041797 dev step: 16550 - loss: 0.42366, precision: 0.53121, recall: 0.61706, f1: 0.57093 current best 0.58724 train epoch: 19 - step: 16560 (total: 17400) - loss: 0.083284 train epoch: 19 - step: 16570 (total: 17400) - loss: 0.027010 train epoch: 19 - step: 16580 (total: 17400) - loss: 0.075735 train epoch: 19 - step: 16590 (total: 17400) - loss: 0.055073 train epoch: 19 - step: 16600 (total: 17400) - loss: 0.089312 dev step: 16600 - loss: 0.40673, precision: 0.53275, recall: 0.62955, f1: 0.57712 current best 0.58724 train epoch: 19 - step: 16610 (total: 17400) - loss: 0.140136 train epoch: 19 - step: 16620 (total: 17400) - loss: 0.056313 train epoch: 19 - step: 16630 (total: 17400) - loss: 0.080976 train epoch: 19 - step: 16640 (total: 17400) - loss: 0.049731 train epoch: 19 - step: 16650 (total: 17400) - loss: 0.029350 dev step: 16650 - loss: 0.41901, precision: 0.53045, recall: 0.63878, f1: 0.57960 current best 0.58724 train epoch: 19 - step: 16660 (total: 17400) - loss: 0.039192 train epoch: 19 - step: 16670 (total: 17400) - loss: 0.114814 train epoch: 19 - step: 16680 (total: 17400) - loss: 0.128558 train epoch: 19 - step: 16690 (total: 17400) - loss: 0.090364 train epoch: 19 - step: 16700 (total: 17400) - loss: 0.015403 dev step: 16700 - loss: 0.40519, precision: 0.52265, recall: 0.61108, f1: 0.56342 current best 0.58724 train epoch: 19 - step: 16710 (total: 17400) - loss: 0.110993 train epoch: 19 - step: 16720 (total: 17400) - loss: 0.070296 train epoch: 19 - step: 16730 (total: 17400) - loss: 0.062231 train epoch: 19 - step: 16740 (total: 17400) - loss: 0.067118 train epoch: 19 - step: 16750 (total: 17400) - loss: 0.041820 dev step: 16750 - loss: 0.40756, precision: 0.51713, recall: 0.62710, f1: 0.56683 current best 0.58724 train epoch: 19 - step: 16760 (total: 17400) - loss: 0.061612 train epoch: 19 - step: 16770 (total: 17400) - loss: 0.121729 train epoch: 19 - step: 16780 (total: 17400) - loss: 0.143003 train epoch: 19 - step: 16790 (total: 17400) - loss: 0.092972 train epoch: 19 - step: 16800 (total: 17400) - loss: 0.085720 dev step: 16800 - loss: 0.39751, precision: 0.52164, recall: 0.61543, f1: 0.56466 current best 0.58724 train epoch: 19 - step: 16810 (total: 17400) - loss: 0.121482 train epoch: 19 - step: 16820 (total: 17400) - loss: 0.056438 train epoch: 19 - step: 16830 (total: 17400) - loss: 0.142359 train epoch: 19 - step: 16840 (total: 17400) - loss: 0.037087 train epoch: 19 - step: 16850 (total: 17400) - loss: 0.090542 dev step: 16850 - loss: 0.43593, precision: 0.54292, recall: 0.62520, f1: 0.58117 current best 0.58724 train epoch: 19 - step: 16860 (total: 17400) - loss: 0.180082 train epoch: 19 - step: 16870 (total: 17400) - loss: 0.053868 train epoch: 19 - step: 16880 (total: 17400) - loss: 0.099053 train epoch: 19 - step: 16890 (total: 17400) - loss: 0.041414 train epoch: 19 - step: 16900 (total: 17400) - loss: 0.059607 dev step: 16900 - loss: 0.40950, precision: 0.53281, recall: 0.64177, f1: 0.58223 current best 0.58724 train epoch: 19 - step: 16910 (total: 17400) - loss: 0.081703 train epoch: 19 - step: 16920 (total: 17400) - loss: 0.058062 train epoch: 19 - step: 16930 (total: 17400) - loss: 0.029519 train epoch: 19 - step: 16940 (total: 17400) - loss: 0.045415 train epoch: 19 - step: 16950 (total: 17400) - loss: 0.078151 dev step: 16950 - loss: 0.39955, precision: 0.52993, recall: 0.62520, f1: 0.57364 current best 0.58724 train epoch: 19 - step: 16960 (total: 17400) - loss: 0.112182 train epoch: 19 - step: 16970 (total: 17400) - loss: 0.072816 train epoch: 19 - step: 16980 (total: 17400) - loss: 0.171157 train epoch: 19 - step: 16990 (total: 17400) - loss: 0.017713 train epoch: 19 - step: 17000 (total: 17400) - loss: 0.090382 dev step: 17000 - loss: 0.41824, precision: 0.54227, recall: 0.61841, f1: 0.57785 current best 0.58724 train epoch: 19 - step: 17010 (total: 17400) - loss: 0.126030 train epoch: 19 - step: 17020 (total: 17400) - loss: 0.072342 train epoch: 19 - step: 17030 (total: 17400) - loss: 0.060565 train epoch: 19 - step: 17040 (total: 17400) - loss: 0.073558 train epoch: 19 - step: 17050 (total: 17400) - loss: 0.033999 dev step: 17050 - loss: 0.42881, precision: 0.52828, recall: 0.61896, f1: 0.57004 current best 0.58724 train epoch: 19 - step: 17060 (total: 17400) - loss: 0.036299 train epoch: 19 - step: 17070 (total: 17400) - loss: 0.052640 train epoch: 19 - step: 17080 (total: 17400) - loss: 0.054092 train epoch: 19 - step: 17090 (total: 17400) - loss: 0.042668 train epoch: 19 - step: 17100 (total: 17400) - loss: 0.058963 dev step: 17100 - loss: 0.42499, precision: 0.52823, recall: 0.62765, f1: 0.57366 current best 0.58724 train epoch: 19 - step: 17110 (total: 17400) - loss: 0.030797 train epoch: 19 - step: 17120 (total: 17400) - loss: 0.096806 train epoch: 19 - step: 17130 (total: 17400) - loss: 0.078804 train epoch: 19 - step: 17140 (total: 17400) - loss: 0.047607 train epoch: 19 - step: 17150 (total: 17400) - loss: 0.056086 dev step: 17150 - loss: 0.39892, precision: 0.53097, recall: 0.58908, f1: 0.55852 current best 0.58724 train epoch: 19 - step: 17160 (total: 17400) - loss: 0.148140 train epoch: 19 - step: 17170 (total: 17400) - loss: 0.096577 train epoch: 19 - step: 17180 (total: 17400) - loss: 0.146454 train epoch: 19 - step: 17190 (total: 17400) - loss: 0.045576 train epoch: 19 - step: 17200 (total: 17400) - loss: 0.084547 dev step: 17200 - loss: 0.39334, precision: 0.51481, recall: 0.59017, f1: 0.54992 current best 0.58724 train epoch: 19 - step: 17210 (total: 17400) - loss: 0.081501 train epoch: 19 - step: 17220 (total: 17400) - loss: 0.079089 train epoch: 19 - step: 17230 (total: 17400) - loss: 0.063774 train epoch: 19 - step: 17240 (total: 17400) - loss: 0.017078 train epoch: 19 - step: 17250 (total: 17400) - loss: 0.086831 dev step: 17250 - loss: 0.38374, precision: 0.52425, recall: 0.62819, f1: 0.57153 current best 0.58724 train epoch: 19 - step: 17260 (total: 17400) - loss: 0.076878 train epoch: 19 - step: 17270 (total: 17400) - loss: 0.036476 train epoch: 19 - step: 17280 (total: 17400) - loss: 0.146443 train epoch: 19 - step: 17290 (total: 17400) - loss: 0.182334 train epoch: 19 - step: 17300 (total: 17400) - loss: 0.040053 dev step: 17300 - loss: 0.40251, precision: 0.52484, recall: 0.63987, f1: 0.57667 current best 0.58724 train epoch: 19 - step: 17310 (total: 17400) - loss: 0.107188 train epoch: 19 - step: 17320 (total: 17400) - loss: 0.143759 train epoch: 19 - step: 17330 (total: 17400) - loss: 0.113866 train epoch: 19 - step: 17340 (total: 17400) - loss: 0.115857 train epoch: 19 - step: 17350 (total: 17400) - loss: 0.035648 dev step: 17350 - loss: 0.41305, precision: 0.52708, recall: 0.61325, f1: 0.56691 current best 0.58724 train epoch: 19 - step: 17360 (total: 17400) - loss: 0.047787 train epoch: 19 - step: 17370 (total: 17400) - loss: 0.057836 train epoch: 19 - step: 17380 (total: 17400) - loss: 0.094507 train epoch: 19 - step: 17390 (total: 17400) - loss: 0.066693 INFO 2021-04-10 22:43:36,736 launch.py:240] Local processes completed. end DuEE1.0 role train
# 论元识别预测
!bash run_duee_1.sh role_predict
check and create directory dir ./ckpt exist dir ./ckpt/DuEE1.0 exist dir ./submit exist start DuEE1.0 role predict /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp [2021-04-10 22:44:10,178] [ INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt [2021-04-10 22:44:10,192] [ INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams W0410 22:44:10.193476 16283 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 W0410 22:44:10.198055 16283 device_context.cc:372] device: 0, cuDNN Version: 7.6. ============start predict========== Loaded parameters from ./ckpt/DuEE1.0/role/best.pdparams save data 499 to ./ckpt/DuEE1.0/role/test_pred.json end DuEE1.0 role predict
# 数据后处理,提交预测结果
# 结果存放于submit/test_duee_1.json
!bash run_duee_1.sh pred_2_submit
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist
start DuEE1.0 predict data merge to submit fotmat
trigger predict 499 load from ./ckpt/DuEE1.0/trigger/test_pred.json
role predict 499 load from ./ckpt/DuEE1.0/role/test_pred.json
schema 65 load from ./conf/DuEE1.0/event_schema.json
submit data 499 save to ./submit/test_duee_1.json
end DuEE1.0 role predict data merge
事件论元结果与人工标注的事件论元结果进行匹配,并按字级别匹配F1进行打分,不区分大小写,如论元有多个表述,则取多个匹配F1中的最高值
f1_score = (2 * P * R) / (P + R),其中
• P=预测论元得分总和 / 所有预测论元的数量
• R=预测论元得分总和 / 所有人工标注论元的数量
• 预测论元得分=事件类型是否准确 * 论元角色是否准确 * 字级别匹配F1值 (*是相乘)
• 字级别匹配F1值 = 2 * 字级别匹配P值 * 字级别匹配R值 / (字级别匹配P值 + 字级别匹配R值)
• 字级别匹配P值 = 预测论元和人工标注论元共有字的数量/ 预测论元字数
• 字级别匹配R值 = 预测论元和人工标注论元共有字的数量/ 人工标注论元字数
基线采用的预训练模型为ERNIE,PaddleNLP提供了丰富的预训练模型,如BERT,RoBERTa,Electra,XLNet等。
如可以选择RoBERTa large中文模型优化模型效果,只需更换模型和tokenizer即可无缝衔接。
from paddlenlp.transformers import RobertaForTokenClassification, RobertaTokenizer
model = RobertaForTokenClassification.from_pretrained("roberta-wwm-ext-large", num_classes=len(label_map))
tokenizer = RobertaTokenizer.from_pretrained("roberta-wwm-ext-large")
[2021-04-10 22:48:18,899] [ INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams and saved to /home/aistudio/.paddlenlp/models/roberta-wwm-ext-large
[2021-04-10 22:48:18,902] [ INFO] - Downloading roberta_chn_large.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams
100%|██████████| 1271615/1271615 [00:18<00:00, 69327.15it/s]
[2021-04-10 22:48:42,145] [ INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/vocab.txt
100%|██████████| 107/107 [00:00<00:00, 2073.95it/s]
对于序列标注任务,大家会想到GRU+CRF作为常用网络,如何在预训练模型基础之上增加这些网络层呢?
import paddle.nn as nn from paddlenlp.transformers import ErnieModel from paddlenlp.layers import LinearChainCrf, LinearChainCrfLoss class Model(ErnieModel): def __init__(self, ernie, num_classes=2, dropout=None, gru_hidden_size=128): super(Model, self).__init__() self.num_classes = num_classes # allow ernie to be config self.ernie = ernie self.dropout = nn.Dropout(dropout if dropout is not None else self.ernie.config["hidden_dropout_prob"]) # add bi-gru self.gru = nn.GRU( input_size=self.ernie.config["hidden_size"], hidden_size=gru_hidden_size, direction='bidirect') self.fc = nn.Linear( in_features=gru_hidden_size * 2, out_features=num_classes) # add crf self.crf = LinearChainCrf( num_classes, with_start_stop_tag=False) self.crf_loss = LinearChainCrfLoss(self.crf) self.viterbi_decoder = ViterbiDecoder( self.crf.transitions, with_start_stop_tag=False) def forward(self, input_ids, token_type_ids=None, position_ids=None, attention_mask=None): sequence_output, _ = self.bert( input_ids, token_type_ids=token_type_ids, position_ids=position_ids, attention_mask=attention_mask) sequence_output = self.dropout(sequence_output) bigru_output, _ = self.gru(sequence_output) emission = self.fc(bigru_output) _, prediction = self.viterbi_decoder(emission, lengths) if labels is not None: loss = self.crf_loss(emission, lengths, prediction, labels) return loss, lengths, prediction, labels else: return inputs, lengths, prediction
使用多个模型进行训练预测,将各个模型预测结果进行融合。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。