赞
踩
这里我就不介绍碳酸盐台地知识图谱的构建, 如果想看用什么方法构建知识图谱,就看一下前面的博客已经有详细的介绍,这里我就不赘述了。这里还是紧接着用已经完成前面的命名体识别,接着做关系抽取,最终反向补全原有的知识图谱,再经过专家的核对,再将词典补充,重新抽取命名体识别,再再做关系抽取,再反向补全,最最终完善起来。这就是学术中的human in the loop(人在回路),这是整个大类的第二步。
目前依旧是那个问题,我缺少数据集,比之前好的是,我有很多已经命名体识别出的文献和中间文本(含有命名体的语句),这我只需要做一些工程,这样就变得很简单。
这里是关系提取需要转变一下思想,我们不需要做具体的词和具体词之间的关系,只需要关键词和关键词之间的关系。换句话说就是我不需要知道是 A Formation 还是 B Formation 还是 AB Formation 对应的某个物质,这是为什么呢,因为我们有具体物质层级关系结构,如图我只要知道Formation 和某个物质父子关系就行,不需要具体名词,在确定关系之后,我们再对这句话进行实体抽取。为什么要这样,而不是实体抽取完了,再关系抽取,这只是对这个项目而言,这样减少数据的冗余,减少运算时间。那下一步就是进行数据的构建
按照上述思路,数据集的构建就很简单,因为只要将词典做简单层级分类就知道谁和谁是父子类关系,父类包含M个实体,子类包含N个实体,那M1对应N3依旧会是父子类关系,因此,我们将这些实体放入固定模板中,那有人会问了,存不存在两个实体多个意思呢,βヾ(,・∇・,川这是不存在的,如果存在就不会放入固定模板这么傻的例子了,只是为了节约时间训练出可用模型,之后等实体关系更多可以扩展。
以时间和物质关系为例,构建模型代码如下:
tim=[]
with open("./duc.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
# 去除换行符
result = ([x.strip() for x in lines if x.strip() != ''])
# 将整理好字典提取成为全局字典
for x in result:
tim.append(x)
sub=[]
with open("./geosubstance1.txt", "r", encoding="utf-8") as f:
lines = f.readlines()
# 去除换行符
result = ([x.strip() for x in lines if x.strip() != ''])
# 将整理好字典提取成为全局字典
for x in result:
sub.append(x)
Result=[]
for Sindex in range(len(sub)):
for Tindex in range(len(tim)):
Result.append(sub[Sindex]+' | '+tim[Tindex]+' | unknown | '+tim[Tindex]+' foreland flexure been also accommodated '+sub[Sindex])
for index in range(len(Result)):
print(Result[index])
f = open('./middle/general.txt', "w", encoding='utf-8')
for line in Result:
f.write(line + '\n')
print("保存成功")
f.close()
print("okA")
最终经过不停的换父子级数据,生成最后的关系抽取数据集部分如下:
Cement | Cretaceous | unknown | Cretaceous foreland flexure been also accommodated Cement
Cement | Berriasian | unknown | Berriasian foreland flexure been also accommodated Cement
Cement | Valanginian | unknown | Valanginian foreland flexure been also accommodated Cement
Cement | Hauterivian | unknown | Hauterivian foreland flexure been also accommodated Cement
Cement | Barremian | unknown | Barremian foreland flexure been also accommodated Cement
Cement | Aptian | unknown | Aptian foreland flexure been also accommodated Cement
Cement | Albian | unknown | Albian foreland flexure been also accommodated Cement
Cement | Cenomanian | unknown | Cenomanian foreland flexure been also accommodated Cement
Cement | Turonian | unknown | Turonian foreland flexure been also accommodated Cement
Cement | Coniacian | unknown | Coniacian foreland flexure been also accommodated Cement
Cement | Santonian | unknown | Santonian foreland flexure been also accommodated Cement
Cement | Campanian | unknown | Campanian foreland flexure been also accommodated Cement
Cement | Maastrichtian | unknown | Maastrichtian foreland flexure been also accommodated Cement
Acicular | Cretaceous | unknown | Cretaceous foreland flexure been also accommodated Acicular
Acicular | Berriasian | unknown | Berriasian foreland flexure been also accommodated Acicular
Acicular | Valanginian | unknown | Valanginian foreland flexure been also accommodated Acicular
Acicular | Hauterivian | unknown | Hauterivian foreland flexure been also accommodated Acicular
Acicular | Barremian | unknown | Barremian foreland flexure been also accommodated Acicular
Acicular | Aptian | unknown | Aptian foreland flexure been also accommodated Acicular
Acicular | Albian | unknown | Albian foreland flexure been also accommodated Acicular
Acicular | Cenomanian | unknown | Cenomanian foreland flexure been also accommodated Acicular
Acicular | Turonian | unknown | Turonian foreland flexure been also accommodated Acicular
好了第一步已经完成,后面就是代码训练,在下一篇博客中代码数据会一起上传到github中。
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。