当前位置:   article > 正文

BIO序列提取实体(NER命名实体识别)_实体列表bio文件导出

实体列表bio文件导出

1、NER命名实体识别,网络预测的结果BIO,如何转录,提取出实体?

思路1:遇到B则前面存在的实体,进行一次存储。多个i粘连一块儿也可能被认为是一个实体。错误的情况是B识别成i了。对于类别判断失误,粘连的实体取众数。

  1. #标签转录BIO格式
  2. string="我是李明,我爱中国,我来自呼和浩特"
  3. predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
  4. item = {"string": string, "entities": []}
  5. entity_name = ""
  6. flag=[]
  7. visit=False
  8. for char, tag in zip(string, predict):
  9. if tag[0] == "b":
  10. if entity_name!="":
  11. x=dict((a,flag.count(a)) for a in flag)
  12. y=[k for k,v in x.items() if max(x.values())==v]
  13. item["entities"].append({"word": entity_name,"type": y[0]})
  14. flag.clear()
  15. entity_name=""
  16. entity_name += char
  17. flag.append(tag[2:])
  18. elif tag[0]=="i":
  19. entity_name += char
  20. flag.append(tag[2:])
  21. else:
  22. if entity_name!="":
  23. x=dict((a,flag.count(a)) for a in flag)
  24. y=[k for k,v in x.items() if max(x.values())==v]
  25. item["entities"].append({"word": entity_name,"type": y[0]})
  26. flag.clear()
  27. flag.clear()
  28. entity_name=""
  29. if entity_name!="":
  30. x=dict((a,flag.count(a)) for a in flag)
  31. y=[k for k,v in x.items() if max(x.values())==v]
  32. item["entities"].append({"word": entity_name,"type": y[0]})
  33. print(item)
{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '李明', 'type': 'per'}, {'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}

思路2:只取B开头的实体,其它的不要。同样类别也是取众数。

  1. #标签转录BIO格式
  2. string="我是李明,我爱中国,我来自呼和浩特"
  3. predict=["o","o","i-per","i-per","o","o","o","b-loc","i-loc","o","o","o","o","b-per","i-loc","i-loc","i-loc"]
  4. item = {"string": string, "entities": []}
  5. entity_name = ""
  6. flag=[]
  7. visit=False
  8. for char, tag in zip(string, tags):
  9. if tag[0] == "b":
  10. if entity_name!="":
  11. x=dict((a,flag.count(a)) for a in flag)
  12. y=[k for k,v in x.items() if max(x.values())==v]
  13. item["entities"].append({"word": entity_name,"type": y[0]})
  14. flag.clear()
  15. entity_name=""
  16. visit=True
  17. entity_name += char
  18. flag.append(tag[2:])
  19. elif tag[0]=="i" and visit:
  20. entity_name += char
  21. flag.append(tag[2:])
  22. else:
  23. if entity_name!="":
  24. x=dict((a,flag.count(a)) for a in flag)
  25. y=[k for k,v in x.items() if max(x.values())==v]
  26. item["entities"].append({"word": entity_name,"type": y[0]})
  27. flag.clear()
  28. flag.clear()
  29. visit=False
  30. entity_name=""
  31. if entity_name!="":
  32. x=dict((a,flag.count(a)) for a in flag)
  33. y=[k for k,v in x.items() if max(x.values())==v]
  34. item["entities"].append({"word": entity_name,"type": y[0]})
  35. print(item)
{'string': '我是李明,我爱中国,我来自呼和浩特', 'entities': [{'word': '中国', 'type': 'loc'}, {'word': '呼和浩特', 'type': 'loc'}]}
声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/正经夜光杯/article/detail/737813
推荐阅读
相关标签
  

闽ICP备14008679号