赞
踩
1、NER命名实体识别,网络预测的结果BIO,如何转录,提取出实体?
思路1:遇到B则前面存在的实体,进行一次存储。多个i粘连一块儿也可能被认为是一个实体。错误的情况是B识别成i了。对于类别判断失误,粘连的实体取众数。
- #标签转录BIO格式
- string="我是李明,我爱中国,我来自呼和浩特"
- predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
- item = {"string": string, "entities": []}
- entity_name = ""
- flag=[]
- visit=False
- for char, tag in zip(string, predict):
- if tag[0] == "b":
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- flag.clear()
- entity_name=""
- entity_name += char
- flag.append(tag[2:])
- elif tag[0]=="i":
- entity_name += char
- flag.append(tag[2:])
- else:
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- flag.clear()
- flag.clear()
- entity_name=""
-
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- print(item)

{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '李明', 'type': 'per'}, {'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}
思路2:只取B开头的实体,其它的不要。同样类别也是取众数。
- #标签转录BIO格式
- string="我是李明,我爱中国,我来自呼和浩特"
- predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
- item = {"string": string, "entities": []}
- entity_name = ""
- flag=[]
- visit=False
- for char, tag in zip(string, tags):
- if tag[0] == "b":
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- flag.clear()
- entity_name=""
- visit=True
- entity_name += char
- flag.append(tag[2:])
- elif tag[0]=="i" and visit:
- entity_name += char
- flag.append(tag[2:])
- else:
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- flag.clear()
- flag.clear()
- visit=False
- entity_name=""
-
- if entity_name!="":
- x=dict((a,flag.count(a)) for a in flag)
- y=[k for k,v in x.items() if max(x.values())==v]
- item["entities"].append({"word": entity_name,"type": y[0]})
- print(item)

{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。