当前位置:   article > 正文

基于Python和Neo4j搭建知识图谱医药问答系统_如何用python构建neo4j知识图谱

如何用python构建neo4j知识图谱

目录

前言

一、启动Neo4j

二、安装py2neo库

三、Python连接Neo4j

四、Pycharm中搭建医药知识图谱

1、读取文件

2、建立节点

3、创建知识图谱中心疾病的节点

4、创建知识图谱实体节点类型

5、创建实体关系边

6、创建实体关联边

7、导出数据

8、运行程序

9、运行结果

五、Pycharm中实现自动问答系统

1、模型初始化

2、问答主函数

3、运行程序

4、运行结果

六、其他(问答子函数)

1、问句类型分类脚本

2、问句解析脚本  

3、问答程序脚本  

总结


前言

本案例用Pycharm编写Python程序操作Neo4j搭建知识图谱医药问答系统实战练习
本案例借鉴刘焕勇老师个人项目


一、启动Neo4j

如何启动Neo4j,请参考此教程

运行项目前,先清空Neo4j数据库

MATCH (n) DETACH DELETE n


二、安装py2neo

代码学习参考py2neo官网文档

pip install py2neo


三、Python连接Neo4j

 链接Neo4j的地址为:"bolt://localhost:7687"

 在本章中最开始的部分已经更改用户名默认为neo4j,密码在本章中最开始的部分已经更改

  1. import json
  2. import os
  3. from py2neo import Graph, Node
  4. class MedicalGraph:
  5. def __init__(self):
  6. cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
  7. self.data_path = os.path.join(cur_dir, 'data/medical.json')
  8. self.g = Graph("bolt://localhost:7687", auth=("neo4j", "tang2001"))

注意:build_medicalgraph.py 和 answer_search.py两个原文件中的self.g = Graph()的链接格式都更改为上述代码中的格式。

参考py2neo官网文档—Python连接Neo4j


四、Pycharm中搭建医药知识图谱

1、读取文件

代码如下:

  1. def read_nodes(self):
  2. # 共7类节点
  3. drugs = [] # 药品
  4. foods = [] # 食物
  5. checks = [] # 检查
  6. departments = [] # 科室
  7. producers = [] # 药品大类
  8. diseases = [] # 疾病
  9. symptoms = [] # 症状
  10. disease_infos = [] # 疾病信息
  11. # 构建节点实体关系
  12. rels_department = [] # 科室-科室关系
  13. rels_noteat = [] # 疾病-忌吃食物关系
  14. rels_doeat = [] # 疾病-宜吃食物关系
  15. rels_recommandeat = [] # 疾病-推荐吃食物关系
  16. rels_commonddrug = [] # 疾病-通用药品关系
  17. rels_recommanddrug = [] # 疾病-热门药品关系
  18. rels_check = [] # 疾病-检查关系
  19. rels_drug_producer = [] # 厂商-药物关系
  20. rels_symptom = [] # 疾病症状关系
  21. rels_acompany = [] # 疾病并发关系
  22. rels_category = [] # 疾病与科室之间的关系
  23. count = 0
  24. for data in open(self.data_path, encoding='utf8', mode='r'):
  25. disease_dict = {}
  26. count += 1
  27. print(count)
  28. data_json = json.loads(data)
  29. disease = data_json['name']
  30. disease_dict['name'] = disease
  31. diseases.append(disease)
  32. disease_dict['desc'] = ''
  33. disease_dict['prevent'] = ''
  34. disease_dict['cause'] = ''
  35. disease_dict['easy_get'] = ''
  36. disease_dict['cure_department'] = ''
  37. disease_dict['cure_way'] = ''
  38. disease_dict['cure_lasttime'] = ''
  39. disease_dict['symptom'] = ''
  40. disease_dict['cured_prob'] = ''
  41. if 'symptom' in data_json:
  42. symptoms += data_json['symptom']
  43. for symptom in data_json['symptom']:
  44. rels_symptom.append([disease, symptom])
  45. if 'acompany' in data_json:
  46. for acompany in data_json['acompany']:
  47. rels_acompany.append([disease, acompany])
  48. if 'desc' in data_json:
  49. disease_dict['desc'] = data_json['desc']
  50. if 'prevent' in data_json:
  51. disease_dict['prevent'] = data_json['prevent']
  52. if 'cause' in data_json:
  53. disease_dict['cause'] = data_json['cause']
  54. if 'get_prob' in data_json:
  55. disease_dict['get_prob'] = data_json['get_prob']
  56. if 'easy_get' in data_json:
  57. disease_dict['easy_get'] = data_json['easy_get']
  58. if 'cure_department' in data_json:
  59. cure_department = data_json['cure_department']
  60. if len(cure_department) == 1:
  61. rels_category.append([disease, cure_department[0]])
  62. if len(cure_department) == 2:
  63. big = cure_department[0]
  64. small = cure_department[1]
  65. rels_department.append([small, big])
  66. rels_category.append([disease, small])
  67. disease_dict['cure_department'] = cure_department
  68. departments += cure_department
  69. if 'cure_way' in data_json:
  70. disease_dict['cure_way'] = data_json['cure_way']
  71. if 'cure_lasttime' in data_json:
  72. disease_dict['cure_lasttime'] = data_json['cure_lasttime']
  73. if 'cured_prob' in data_json:
  74. disease_dict['cured_prob'] = data_json['cured_prob']
  75. if 'common_drug' in data_json:
  76. common_drug = data_json['common_drug']
  77. for drug in common_drug:
  78. rels_commonddrug.append([disease, drug])
  79. drugs += common_drug
  80. if 'recommand_drug' in data_json:
  81. recommand_drug = data_json['recommand_drug']
  82. drugs += recommand_drug
  83. for drug in recommand_drug:
  84. rels_recommanddrug.append([disease, drug])
  85. if 'not_eat' in data_json:
  86. not_eat = data_json['not_eat']
  87. for _not in not_eat:
  88. rels_noteat.append([disease, _not])
  89. foods += not_eat
  90. do_eat = data_json['do_eat']
  91. for _do in do_eat:
  92. rels_doeat.append([disease, _do])
  93. foods += do_eat
  94. recommand_eat = data_json['recommand_eat']
  95. for _recommand in recommand_eat:
  96. rels_recommandeat.append([disease, _recommand])
  97. foods += recommand_eat
  98. if 'check' in data_json:
  99. check = data_json['check']
  100. for _check in check:
  101. rels_check.append([disease, _check])
  102. checks += check
  103. if 'drug_detail' in data_json:
  104. drug_detail = data_json['drug_detail']
  105. producer = [i.split('(')[0] for i in drug_detail]
  106. rels_drug_producer += [[i.split('(')[0], i.split('(')[-1].replace(')', '')] for i in drug_detail]
  107. producers += producer
  108. disease_infos.append(disease_dict)
  109. return set(drugs), set(foods), set(checks), set(departments), set(producers), set(symptoms), set(diseases), disease_infos, \
  110. rels_check, rels_recommandeat, rels_noteat, rels_doeat, rels_department, rels_commonddrug, rels_drug_producer, rels_recommanddrug, \
  111. rels_symptom, rels_acompany, rels_category

2、建立节点

 代码如下:

  1. def create_node(self, label, nodes):
  2. count = 0
  3. for node_name in nodes:
  4. node = Node(label, name=node_name)
  5. self.g.create(node)
  6. count += 1
  7. print(count, len(nodes))
  8. return

3、创建知识图谱中心疾病的节点

 代码如下:

  1. def create_diseases_nodes(self, disease_infos):
  2. count = 0
  3. for disease_dict in disease_infos:
  4. node = Node("Disease", name=disease_dict['name'], desc=disease_dict['desc'],
  5. prevent=disease_dict['prevent'], cause=disease_dict['cause'],
  6. easy_get=disease_dict['easy_get'], cure_lasttime=disease_dict['cure_lasttime'],
  7. cure_department=disease_dict['cure_department']
  8. , cure_way=disease_dict['cure_way'], cured_prob=disease_dict['cured_prob'])
  9. self.g.create(node)
  10. count += 1
  11. print(count)
  12. return

4、创建知识图谱实体节点类型

代码如下:

  1. def create_graphnodes(self):
  2. Drugs, Foods, Checks, Departments, Producers, Symptoms, Diseases, disease_infos, rels_check, rels_recommandeat, rels_noteat, rels_doeat, rels_department, rels_commonddrug, rels_drug_producer, rels_recommanddrug, rels_symptom, rels_acompany, rels_category = self.read_nodes()
  3. self.create_diseases_nodes(disease_infos)
  4. self.create_node('Drug', Drugs)
  5. print(len(Drugs))
  6. self.create_node('Food', Foods)
  7. print(len(Foods))
  8. self.create_node('Check', Checks)
  9. print(len(Checks))
  10. self.create_node('Department', Departments)
  11. print(len(Departments))
  12. self.create_node('Producer', Producers)
  13. print(len(Producers))
  14. self.create_node('Symptom', Symptoms)
  15. return

5、创建实体关系边

 代码如下:

  1. def create_graphrels(self):
  2. Drugs, Foods, Checks, Departments, Producers, Symptoms, Diseases, disease_infos, rels_check, rels_recommandeat, rels_noteat, rels_doeat, rels_department, rels_commonddrug, rels_drug_producer, rels_recommanddrug, rels_symptom, rels_acompany, rels_category = self.read_nodes()
  3. self.create_relationship('Disease', 'Food', rels_recommandeat, 'recommand_eat', '推荐食谱')
  4. self.create_relationship('Disease', 'Food', rels_noteat, 'no_eat', '忌吃')
  5. self.create_relationship('Disease', 'Food', rels_doeat, 'do_eat', '宜吃')
  6. self.create_relationship('Department', 'Department', rels_department, 'belongs_to', '属于')
  7. self.create_relationship('Disease', 'Drug', rels_commonddrug, 'common_drug', '常用药品')
  8. self.create_relationship('Producer', 'Drug', rels_drug_producer, 'drugs_of', '生产药品')
  9. self.create_relationship('Disease', 'Drug', rels_recommanddrug, 'recommand_drug', '好评药品')
  10. self.create_relationship('Disease', 'Check', rels_check, 'need_check', '诊断检查')
  11. self.create_relationship('Disease', 'Symptom', rels_symptom, 'has_symptom', '症状')
  12. self.create_relationship('Disease', 'Disease', rels_acompany, 'acompany_with', '并发症')
  13. self.create_relationship('Disease', 'Department', rels_category, 'belongs_to', '所属科室')

6、创建实体关联边

 代码如下:

  1. def create_relationship(self, start_node, end_node, edges, rel_type, rel_name):
  2. count = 0
  3. # 去重处理
  4. set_edges = []
  5. for edge in edges:
  6. set_edges.append('###'.join(edge))
  7. all = len(set(set_edges))
  8. for edge in set(set_edges):
  9. edge = edge.split('###')
  10. p = edge[0]
  11. q = edge[1]
  12. query = "match(p:%s),(q:%s) where p.name='%s'and q.name='%s' create (p)-[rel:%s{name:'%s'}]->(q)" % (
  13. start_node, end_node, p, q, rel_type, rel_name)
  14. try:
  15. self.g.run(query)
  16. count += 1
  17. print(rel_type, count, all)
  18. except Exception as e:
  19. print(e)
  20. return

7、导出数据

 代码如下:

  1. def export_data(self):
  2. Drugs, Foods, Checks, Departments, Producers, Symptoms, Diseases, disease_infos, rels_check, rels_recommandeat, rels_noteat, rels_doeat, rels_department, rels_commonddrug, rels_drug_producer, rels_recommanddrug, rels_symptom, rels_acompany, rels_category = self.read_nodes()
  3. f_drug = open('drug.txt', 'w+')
  4. f_food = open('food.txt', 'w+')
  5. f_check = open('check.txt', 'w+')
  6. f_department = open('department.txt', 'w+')
  7. f_producer = open('producer.txt', 'w+')
  8. f_symptom = open('symptoms.txt', 'w+')
  9. f_disease = open('disease.txt', 'w+')
  10. f_drug.write('\n'.join(list(Drugs)))
  11. f_food.write('\n'.join(list(Foods)))
  12. f_check.write('\n'.join(list(Checks)))
  13. f_department.write('\n'.join(list(Departments)))
  14. f_producer.write('\n'.join(list(Producers)))
  15. f_symptom.write('\n'.join(list(Symptoms)))
  16. f_disease.write('\n'.join(list(Diseases)))
  17. f_drug.close()
  18. f_food.close()
  19. f_check.close()
  20. f_department.close()
  21. f_producer.close()
  22. f_symptom.close()
  23. f_disease.close()
  24. return

8、运行程序

代码如下:运行 build_medicalgraph.py 文件(导入的数据较多,估计需要1个多小时)

  1. if __name__ == '__main__':
  2. handler = MedicalGraph()
  3. print("step1:导入图谱节点中")
  4. handler.create_graphnodes()
  5. print("step2:导入图谱边中")
  6. handler.create_graphrels()

直接运行刘老师的代码时,会出现如下错误:

UnicodeDecodeError: 'gbk' codec can't decode byte 0xaf in position 81: illegal multibyte sequence

解决方法:按Ctrl+F,输入open,在第一个open函数后加入下述代码即可

for data in open(self.data_path, encoding='utf8', mode='r'):

9、运行结果


五、Pycharm中实现自动问答系统

1、模型初始化

代码如下:

  1. from answer_search import *
  2. from question_classifier import *
  3. from question_parser import *
  4. class ChatBotGraph:
  5. def __init__(self):
  6. self.classifier = QuestionClassifier()
  7. self.parser = QuestionPaser()
  8. self.searcher = AnswerSearcher()

2、问答主函数

代码如下:
  1. def chat_main(self, sent):
  2. answer = '您好,我是医药智能助理,希望可以帮到您。祝您身体棒棒!'
  3. res_classify = self.classifier.classify(sent)
  4. if not res_classify:
  5. return answer
  6. res_sql = self.parser.parser_main(res_classify)
  7. final_answers = self.searcher.search_main(res_sql)
  8. if not final_answers:
  9. return answer
  10. else:
  11. return '\n'.join(final_answers)

3、运行程序

代码如下:运行 chatbot_graph.py 文件
  1. if __name__ == '__main__':
  2. handler = ChatBotGraph()
  3. while 1:
  4. question = input('用户:')
  5. answer = handler.chat_main(question)
  6. print('医药智能助理:', answer)

4、运行结果


六、其他(问答子函数)

1、问句类型分类脚本

代码如下:运行 question_classifier 文件

  1. import os
  2. import ahocorasick
  3. class QuestionClassifier:
  4. def __init__(self):
  5. cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
  6. # 特征词路径
  7. self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
  8. self.department_path = os.path.join(cur_dir, 'dict/department.txt')
  9. self.check_path = os.path.join(cur_dir, 'dict/check.txt')
  10. self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
  11. self.food_path = os.path.join(cur_dir, 'dict/food.txt')
  12. self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
  13. self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
  14. self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
  15. # 加载特征词
  16. self.disease_wds= [i.strip() for i in open(self.disease_path, encoding='utf8') if i.strip()]
  17. self.department_wds= [i.strip() for i in open(self.department_path, encoding='utf8') if i.strip()]
  18. self.check_wds= [i.strip() for i in open(self.check_path, encoding='utf8') if i.strip()]
  19. self.drug_wds= [i.strip() for i in open(self.drug_path, encoding='utf8') if i.strip()]
  20. self.food_wds= [i.strip() for i in open(self.food_path, encoding='utf8') if i.strip()]
  21. self.producer_wds= [i.strip() for i in open(self.producer_path, encoding='utf8') if i.strip()]
  22. self.symptom_wds= [i.strip() for i in open(self.symptom_path, encoding='utf8') if i.strip()]
  23. self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
  24. self.deny_words = [i.strip() for i in open(self.deny_path, encoding='utf8') if i.strip()]
  25. # 构造领域actree
  26. self.region_tree = self.build_actree(list(self.region_words))
  27. # 构建词典
  28. self.wdtype_dict = self.build_wdtype_dict()
  29. # 问句疑问词
  30. self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
  31. self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
  32. self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
  33. self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
  34. self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
  35. self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
  36. '怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
  37. '怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
  38. '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
  39. '怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
  40. self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
  41. self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
  42. self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
  43. self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
  44. self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
  45. self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
  46. self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
  47. '有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']
  48. print('model init finished ......')
  49. return
  50. '''分类主函数'''
  51. def classify(self, question):
  52. data = {}
  53. medical_dict = self.check_medical(question)
  54. if not medical_dict:
  55. return {}
  56. data['args'] = medical_dict
  57. #收集问句当中所涉及到的实体类型
  58. types = []
  59. for type_ in medical_dict.values():
  60. types += type_
  61. question_type = 'others'
  62. question_types = []
  63. # 症状
  64. if self.check_words(self.symptom_qwds, question) and ('disease' in types):
  65. question_type = 'disease_symptom'
  66. question_types.append(question_type)
  67. if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
  68. question_type = 'symptom_disease'
  69. question_types.append(question_type)
  70. # 原因
  71. if self.check_words(self.cause_qwds, question) and ('disease' in types):
  72. question_type = 'disease_cause'
  73. question_types.append(question_type)
  74. # 并发症
  75. if self.check_words(self.acompany_qwds, question) and ('disease' in types):
  76. question_type = 'disease_acompany'
  77. question_types.append(question_type)
  78. # 推荐食品
  79. if self.check_words(self.food_qwds, question) and 'disease' in types:
  80. deny_status = self.check_words(self.deny_words, question)
  81. if deny_status:
  82. question_type = 'disease_not_food'
  83. else:
  84. question_type = 'disease_do_food'
  85. question_types.append(question_type)
  86. #已知食物找疾病
  87. if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
  88. deny_status = self.check_words(self.deny_words, question)
  89. if deny_status:
  90. question_type = 'food_not_disease'
  91. else:
  92. question_type = 'food_do_disease'
  93. question_types.append(question_type)
  94. # 推荐药品
  95. if self.check_words(self.drug_qwds, question) and 'disease' in types:
  96. question_type = 'disease_drug'
  97. question_types.append(question_type)
  98. # 药品治啥病
  99. if self.check_words(self.cure_qwds, question) and 'drug' in types:
  100. question_type = 'drug_disease'
  101. question_types.append(question_type)
  102. # 疾病接受检查项目
  103. if self.check_words(self.check_qwds, question) and 'disease' in types:
  104. question_type = 'disease_check'
  105. question_types.append(question_type)
  106. # 已知检查项目查相应疾病
  107. if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
  108. question_type = 'check_disease'
  109. question_types.append(question_type)
  110. # 症状防御
  111. if self.check_words(self.prevent_qwds, question) and 'disease' in types:
  112. question_type = 'disease_prevent'
  113. question_types.append(question_type)
  114. # 疾病医疗周期
  115. if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
  116. question_type = 'disease_lasttime'
  117. question_types.append(question_type)
  118. # 疾病治疗方式
  119. if self.check_words(self.cureway_qwds, question) and 'disease' in types:
  120. question_type = 'disease_cureway'
  121. question_types.append(question_type)
  122. # 疾病治愈可能性
  123. if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
  124. question_type = 'disease_cureprob'
  125. question_types.append(question_type)
  126. # 疾病易感染人群
  127. if self.check_words(self.easyget_qwds, question) and 'disease' in types :
  128. question_type = 'disease_easyget'
  129. question_types.append(question_type)
  130. # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
  131. if question_types == [] and 'disease' in types:
  132. question_types = ['disease_desc']
  133. # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
  134. if question_types == [] and 'symptom' in types:
  135. question_types = ['symptom_disease']
  136. # 将多个分类结果进行合并处理,组装成一个字典
  137. data['question_types'] = question_types
  138. return data
  139. '''构造词对应的类型'''
  140. def build_wdtype_dict(self):
  141. wd_dict = dict()
  142. for wd in self.region_words:
  143. wd_dict[wd] = []
  144. if wd in self.disease_wds:
  145. wd_dict[wd].append('disease')
  146. if wd in self.department_wds:
  147. wd_dict[wd].append('department')
  148. if wd in self.check_wds:
  149. wd_dict[wd].append('check')
  150. if wd in self.drug_wds:
  151. wd_dict[wd].append('drug')
  152. if wd in self.food_wds:
  153. wd_dict[wd].append('food')
  154. if wd in self.symptom_wds:
  155. wd_dict[wd].append('symptom')
  156. if wd in self.producer_wds:
  157. wd_dict[wd].append('producer')
  158. return wd_dict
  159. '''构造actree,加速过滤'''
  160. def build_actree(self, wordlist):
  161. actree = ahocorasick.Automaton()
  162. for index, word in enumerate(wordlist):
  163. actree.add_word(word, (index, word))
  164. actree.make_automaton()
  165. return actree
  166. '''问句过滤'''
  167. def check_medical(self, question):
  168. region_wds = []
  169. for i in self.region_tree.iter(question):
  170. wd = i[1][1]
  171. region_wds.append(wd)
  172. stop_wds = []
  173. for wd1 in region_wds:
  174. for wd2 in region_wds:
  175. if wd1 in wd2 and wd1 != wd2:
  176. stop_wds.append(wd1)
  177. final_wds = [i for i in region_wds if i not in stop_wds]
  178. final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}
  179. return final_dict
  180. '''基于特征词进行分类'''
  181. def check_words(self, wds, sent):
  182. for wd in wds:
  183. if wd in sent:
  184. return True
  185. return False
  186. if __name__ == '__main__':
  187. handler = QuestionClassifier()
  188. while 1:
  189. question = input('input an question:')
  190. data = handler.classify(question)
  191. print(data)

2、问句解析脚本  

代码如下:运行 question_parser.py 文件

  1. class QuestionPaser:
  2. '''构建实体节点'''
  3. def build_entitydict(self, args):
  4. entity_dict = {}
  5. for arg, types in args.items():
  6. for type in types:
  7. if type not in entity_dict:
  8. entity_dict[type] = [arg]
  9. else:
  10. entity_dict[type].append(arg)
  11. return entity_dict
  12. '''解析主函数'''
  13. def parser_main(self, res_classify):
  14. args = res_classify['args']
  15. entity_dict = self.build_entitydict(args)
  16. question_types = res_classify['question_types']
  17. sqls = []
  18. for question_type in question_types:
  19. sql_ = {}
  20. sql_['question_type'] = question_type
  21. sql = []
  22. if question_type == 'disease_symptom':
  23. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  24. elif question_type == 'symptom_disease':
  25. sql = self.sql_transfer(question_type, entity_dict.get('symptom'))
  26. elif question_type == 'disease_cause':
  27. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  28. elif question_type == 'disease_acompany':
  29. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  30. elif question_type == 'disease_not_food':
  31. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  32. elif question_type == 'disease_do_food':
  33. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  34. elif question_type == 'food_not_disease':
  35. sql = self.sql_transfer(question_type, entity_dict.get('food'))
  36. elif question_type == 'food_do_disease':
  37. sql = self.sql_transfer(question_type, entity_dict.get('food'))
  38. elif question_type == 'disease_drug':
  39. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  40. elif question_type == 'drug_disease':
  41. sql = self.sql_transfer(question_type, entity_dict.get('drug'))
  42. elif question_type == 'disease_check':
  43. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  44. elif question_type == 'check_disease':
  45. sql = self.sql_transfer(question_type, entity_dict.get('check'))
  46. elif question_type == 'disease_prevent':
  47. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  48. elif question_type == 'disease_lasttime':
  49. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  50. elif question_type == 'disease_cureway':
  51. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  52. elif question_type == 'disease_cureprob':
  53. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  54. elif question_type == 'disease_easyget':
  55. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  56. elif question_type == 'disease_desc':
  57. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  58. if sql:
  59. sql_['sql'] = sql
  60. sqls.append(sql_)
  61. return sqls
  62. '''针对不同的问题,分开进行处理'''
  63. def sql_transfer(self, question_type, entities):
  64. if not entities:
  65. return []
  66. # 查询语句
  67. sql = []
  68. # 查询疾病的原因
  69. if question_type == 'disease_cause':
  70. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cause".format(i) for i in entities]
  71. # 查询疾病的防御措施
  72. elif question_type == 'disease_prevent':
  73. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.prevent".format(i) for i in entities]
  74. # 查询疾病的持续时间
  75. elif question_type == 'disease_lasttime':
  76. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_lasttime".format(i) for i in entities]
  77. # 查询疾病的治愈概率
  78. elif question_type == 'disease_cureprob':
  79. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cured_prob".format(i) for i in entities]
  80. # 查询疾病的治疗方式
  81. elif question_type == 'disease_cureway':
  82. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_way".format(i) for i in entities]
  83. # 查询疾病的易发人群
  84. elif question_type == 'disease_easyget':
  85. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.easy_get".format(i) for i in entities]
  86. # 查询疾病的相关介绍
  87. elif question_type == 'disease_desc':
  88. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.desc".format(i) for i in entities]
  89. # 查询疾病有哪些症状
  90. elif question_type == 'disease_symptom':
  91. sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  92. # 查询症状会导致哪些疾病
  93. elif question_type == 'symptom_disease':
  94. sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  95. # 查询疾病的并发症
  96. elif question_type == 'disease_acompany':
  97. sql1 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  98. sql2 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  99. sql = sql1 + sql2
  100. # 查询疾病的忌口
  101. elif question_type == 'disease_not_food':
  102. sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  103. # 查询疾病建议吃的东西
  104. elif question_type == 'disease_do_food':
  105. sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  106. sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  107. sql = sql1 + sql2
  108. # 已知忌口查疾病
  109. elif question_type == 'food_not_disease':
  110. sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  111. # 已知推荐查疾病
  112. elif question_type == 'food_do_disease':
  113. sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  114. sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  115. sql = sql1 + sql2
  116. # 查询疾病常用药品-药品别名记得扩充
  117. elif question_type == 'disease_drug':
  118. sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  119. sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  120. sql = sql1 + sql2
  121. # 已知药品查询能够治疗的疾病
  122. elif question_type == 'drug_disease':
  123. sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  124. sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  125. sql = sql1 + sql2
  126. # 查询疾病应该进行的检查
  127. elif question_type == 'disease_check':
  128. sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  129. # 已知检查查询疾病
  130. elif question_type == 'check_disease':
  131. sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  132. return sql
  133. if __name__ == '__main__':
  134. handler = QuestionPaser()

3、问答程序脚本  

代码如下:运行 answer_search.py 文件

  1. from py2neo import Graph
  2. class AnswerSearcher:
  3. def __init__(self):
  4. self.g = Graph("bolt://localhost:7687", auth=("neo4j", "tang2001"))
  5. self.num_limit = 20
  6. '''执行cypher查询,并返回相应结果'''
  7. def search_main(self, sqls):
  8. final_answers = []
  9. for sql_ in sqls:
  10. question_type = sql_['question_type']
  11. queries = sql_['sql']
  12. answers = []
  13. for query in queries:
  14. ress = self.g.run(query).data()
  15. answers += ress
  16. final_answer = self.answer_prettify(question_type, answers)
  17. if final_answer:
  18. final_answers.append(final_answer)
  19. return final_answers
  20. '''根据对应的qustion_type,调用相应的回复模板'''
  21. def answer_prettify(self, question_type, answers):
  22. final_answer = []
  23. if not answers:
  24. return ''
  25. if question_type == 'disease_symptom':
  26. desc = [i['n.name'] for i in answers]
  27. subject = answers[0]['m.name']
  28. final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  29. elif question_type == 'symptom_disease':
  30. desc = [i['m.name'] for i in answers]
  31. subject = answers[0]['n.name']
  32. final_answer = '症状{0}可能染上的疾病有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  33. elif question_type == 'disease_cause':
  34. desc = [i['m.cause'] for i in answers]
  35. subject = answers[0]['m.name']
  36. final_answer = '{0}可能的成因有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  37. elif question_type == 'disease_prevent':
  38. desc = [i['m.prevent'] for i in answers]
  39. subject = answers[0]['m.name']
  40. final_answer = '{0}的预防措施包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  41. elif question_type == 'disease_lasttime':
  42. desc = [i['m.cure_lasttime'] for i in answers]
  43. subject = answers[0]['m.name']
  44. final_answer = '{0}治疗可能持续的周期为:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  45. elif question_type == 'disease_cureway':
  46. desc = [';'.join(i['m.cure_way']) for i in answers]
  47. subject = answers[0]['m.name']
  48. final_answer = '{0}可以尝试如下治疗:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  49. elif question_type == 'disease_cureprob':
  50. desc = [i['m.cured_prob'] for i in answers]
  51. subject = answers[0]['m.name']
  52. final_answer = '{0}治愈的概率为(仅供参考):{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  53. elif question_type == 'disease_easyget':
  54. desc = [i['m.easy_get'] for i in answers]
  55. subject = answers[0]['m.name']
  56. final_answer = '{0}的易感人群包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  57. elif question_type == 'disease_desc':
  58. desc = [i['m.desc'] for i in answers]
  59. subject = answers[0]['m.name']
  60. final_answer = '{0},熟悉一下:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  61. elif question_type == 'disease_acompany':
  62. desc1 = [i['n.name'] for i in answers]
  63. desc2 = [i['m.name'] for i in answers]
  64. subject = answers[0]['m.name']
  65. desc = [i for i in desc1 + desc2 if i != subject]
  66. final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  67. elif question_type == 'disease_not_food':
  68. desc = [i['n.name'] for i in answers]
  69. subject = answers[0]['m.name']
  70. final_answer = '{0}忌食的食物包括有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  71. elif question_type == 'disease_do_food':
  72. do_desc = [i['n.name'] for i in answers if i['r.name'] == '宜吃']
  73. recommand_desc = [i['n.name'] for i in answers if i['r.name'] == '推荐食谱']
  74. subject = answers[0]['m.name']
  75. final_answer = '{0}宜食的食物包括有:{1}\n推荐食谱包括有:{2}'.format(subject, ';'.join(list(set(do_desc))[:self.num_limit]), ';'.join(list(set(recommand_desc))[:self.num_limit]))
  76. elif question_type == 'food_not_disease':
  77. desc = [i['m.name'] for i in answers]
  78. subject = answers[0]['n.name']
  79. final_answer = '患有{0}的人最好不要吃{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
  80. elif question_type == 'food_do_disease':
  81. desc = [i['m.name'] for i in answers]
  82. subject = answers[0]['n.name']
  83. final_answer = '患有{0}的人建议多试试{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
  84. elif question_type == 'disease_drug':
  85. desc = [i['n.name'] for i in answers]
  86. subject = answers[0]['m.name']
  87. final_answer = '{0}通常的使用的药品包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  88. elif question_type == 'drug_disease':
  89. desc = [i['m.name'] for i in answers]
  90. subject = answers[0]['n.name']
  91. final_answer = '{0}主治的疾病有{1},可以试试'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  92. elif question_type == 'disease_check':
  93. desc = [i['n.name'] for i in answers]
  94. subject = answers[0]['m.name']
  95. final_answer = '{0}通常可以通过以下方式检查出来:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  96. elif question_type == 'check_disease':
  97. desc = [i['m.name'] for i in answers]
  98. subject = answers[0]['n.name']
  99. final_answer = '通常可以通过{0}检查出来的疾病有{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  100. return final_answer
  101. if __name__ == '__main__':
  102. searcher = AnswerSearcher()

总结

本项目立足医药领域,以垂直型医药网站为数据来源,以疾病为核心,构建起一个包含7类规模为4.4万的知识实体,11类规模约30万实体关系的知识图谱。

如需要项目代码自取:李焕勇老师个人项目百度网盘下载链接

本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号