赞
踩
参考:
关系抽取之TPLinker解读加源码分析
TPLinker 实体关系抽取代码解读
实体关系联合抽取:TPlinker
TPLinker中文注释版
TPLinker模型需要对关系三元组(subject, relation, object)进行手动Tagging,过程分为三部分:
(1)entity head to entity tail (EH-TO-ET)
(2)subject head to object head (SH-to-OH)
(3)subject tail to object tail (ST-to-OT)
标记示例见下图,EH-TO-ET用紫色表示,SH-to-OH用红色表示,ST-to-OT用蓝色表示。
模型比较简单,整个句子过一遍 encoder,然后将 token 两两拼接输入到一个全连接层,再激活一下输出作为 token 对的向量表示,最后对 token 对进行分类即可。换句话说,这其实是一个较长序列的标注过程。
将模型对应的输出结果与输入的text进行匹配,解码出所需要的三元组。
通常先进行实体抽取得到字典D(key是实体头部,value是实体尾部)。
通过解码ST-to-OT关系得到有关系的两个实体的尾部,构建为字典E
通过解码SH-to-OH关系得到有关系的两个实体的头部,然后结合字典D,可以得到后续两个实体尾部。判断这两个实体尾部在不在字典E里面,如果在就是成功抽取了一条三元组。
{"text": "In Queens , North Shore Towers , near the Nassau border , supplanted a golf course , and housing replaced a gravel quarry in Douglaston .", "id": "valid_0", "relation_list": [{"subject": "Douglaston", "object": "Queens", "subj_char_span": [125, 135], "obj_char_span": [3, 9], "predicate": "/location/neighborhood/neighborhood_of", "subj_tok_span": [26, 28], "obj_tok_span": [1, 2]}, {"subject": "Queens", "object": "Douglaston", "subj_char_span": [3, 9], "obj_char_span": [125, 135], "predicate": "/location/location/contains", "subj_tok_span": [1, 2], "obj_tok_span": [26, 28]}], "entity_list": [{"text": "Douglaston", "type": "DEFAULT", "char_span": [125, 135], "tok_span": [26, 28]}, {"text": "Queens", "type": "DEFAULT", "char_span": [3, 9], "tok_span": [1, 2]}, {"text": "Queens", "type": "DEFAULT", "char_span": [3, 9], "tok_span": [1, 2]}, {"text": "Douglaston", "type": "DEFAULT", "char_span": [125, 135], "tok_span": [26, 28]}]}
训练数据的最外层有4个主键:
根据模型所需要输入的数据对训练数据进行一系列的处理。
调整训练时的参数去训练自己的数据。
调整输入输出为自己喜欢的样式。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。