赞
踩
The Chinese training data($PATH/NERdata/) come from:https://github.com/zjy-ucas/ChineseNER
链接:https://pan.baidu.com/s/1JBnda5rgUsZjgYR5W7u-Fg
提取码:x16l
【NLP】序列标注BIO介绍(也叫IOB2)_mjiansun的专栏-CSDN博客
总共四类:persons, locations, organizations
对应的简写:PER LOC ORG
- import os
-
- if __name__ == "__main__":
- rootPath = "/data2/PrivateExperiment/bilstm-crf-ner/NERdata/train.txt"
- savePath = "/data2/PrivateExperiment/bilstm-crf-ner/NERdata/process/toformat.txt"
- collectData = []
- with open(rootPath, "r") as f:
- for line in f.readlines():
- lineStrs = line.strip().split()
- if len(lineStrs) > 0:
- collectData.append([lineStrs[0], lineStrs[1] + "\n"])
- else:
- collectData.append(["\n"])
-
- with open(savePath, "w", encoding="utf-8") as f:
- for line in collectData:
- f.write("\t".join(line))
生成结果为
admin.jsonl为正常的数据,unknown.jsonl为不正常数据。
admin.jsonl
unknown.jsonl
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。