当前位置:   article > 正文

基于医疗知识图谱的问答系统运行步骤-注意事项_医疗问答系统

医疗问答系统

本项目下载的是中科院刘焕勇的源码

https://github.com/liuhuanyong/QASystemOnMedicalKG

下载后如何运行的步骤方法:

(1)安装neo4j数据库以及相应的包,安装Neo4j时要先安装JDKjava开发工具包。要注意使用的版本问题,Neo4j是版本4的,Java是1.8版本的,在本项目中使用的是py2neo=4.3.0版本的数据包,太高不可以运行。

以下是关于安装Neo4j的相关链接以及基础了解:

https://blog.csdn.net/sinat_36226553/article/details/108541370?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164862259616782094864946%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=164862259616782094864946&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~top_click~default-3-108541370.142^v5^pc_search_result_control_group,143^v6^register&utm_term=neo4j&spm=1018.2226.3001.4187

https://so.csdn.net/so/search?q=neo4j&spm=1001.2101.3001.7020

运行步骤以及版本等相关操作

(2)python 安装py2neo和pyahocorasick包,安装pyahocorasick的时候报错,提示安装Visual Studio Build Tools:
先安装 Microsoft Visual C++ :在 https://visualstudio.microsoft.com/downloads/ 下载Build Tools, 安装后,在模块选择里勾选Visual Studio Build Tools里面的C++ Build Tools。
有的人说直接用anaconda安装pyahocorasick不需要安装VC,具体我没试过。
(3)接着运行程序:
1)先修改build_medicalgraph和answer_search的user和password,改成你的neo4j的账号名和密码

2)然后在build_medicalgraph的最后两行添加:
handler.create_graphnodes()
handler.create_graphrels()
3)运行build_medicalgraph,有的可能会报错:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 81: illegal multibyte sequence.
把有open的地方加上encoding=‘utf-8’

 

4)数据很多,会运行几个小时,运行完之后打开neo4j explore,就有节点和图 

 

5)再运行chatbot_graph.py,输入你想问的问题,就会出来答案

 关于模型代码的解析:

(1)对于知识图谱的构建,首先是数据的获取,数据主要是通过爬虫获取到的,且是结构化数据,对于半结构化数据无需从句子或文章中进行知识抽取等相关操作,最终本文主要是通过将数据保存成json格式使用数据。构建数据这部分主要是构建实体类型,属性以及关系的相关操作,源代码中有相应的注解,就不在此贴出相关的代码解释了。代码还包括了问句的分类、解析、对解析结果的查询以及返回查询问句结果几部分,代码包括自己的理解,如有其他见解或错误请提出,仅代表我个人的理解。

(2)部分代码片段

问句分类部分

  1. import os
  2. import ahocorasick
  3. #自动机
  4. #可实现自动批量匹配字符串的作用,即可一次返回该条字符串中命中的所有关键词
  5. class QuestionClassifier:
  6. def __init__(self):
  7. #cur_dir 是当前目录,其中[:-1]可以达到返回上一层的效果
  8. #获取的绝对路径os.path.abspath(__file__)
  9. cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
  10. # 特征词路径
  11. self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
  12. self.department_path = os.path.join(cur_dir, 'dict/department.txt')
  13. self.check_path = os.path.join(cur_dir, 'dict/check.txt')
  14. self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
  15. self.food_path = os.path.join(cur_dir, 'dict/food.txt')
  16. self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
  17. self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
  18. self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
  19. # 加载特征词,七类词包括七种实体部分的词和构建的领域词和一些否定词
  20. self.disease_wds= [i.strip() for i in open(self.disease_path,encoding='utf-8') if i.strip()]
  21. self.department_wds= [i.strip() for i in open(self.department_path,encoding='utf-8') if i.strip()]
  22. self.check_wds= [i.strip() for i in open(self.check_path,encoding='utf-8') if i.strip()]
  23. self.drug_wds= [i.strip() for i in open(self.drug_path,encoding='utf-8') if i.strip()]
  24. self.food_wds= [i.strip() for i in open(self.food_path,encoding='utf-8') if i.strip()]
  25. self.producer_wds= [i.strip() for i in open(self.producer_path,encoding='utf-8') if i.strip()]
  26. self.symptom_wds= [i.strip() for i in open(self.symptom_path,encoding='utf-8') if i.strip()]
  27. self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
  28. self.deny_words = [i.strip() for i in open(self.deny_path,encoding='utf-8') if i.strip()]
  29. # 构造领域actree
  30. self.region_tree = self.build_actree(list(self.region_words))
  31. # 构建词典-格式比如{'感冒':'disease'....}
  32. self.wdtype_dict = self.build_wdtype_dict()
  33. # 问句疑问词,问句疑问包含了疾病的属性和边相关的问题词
  34. self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
  35. self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
  36. self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
  37. self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
  38. self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
  39. self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
  40. '怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
  41. '怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
  42. '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
  43. '怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
  44. self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
  45. self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
  46. self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
  47. self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
  48. self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
  49. self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
  50. self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
  51. '有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']
  52. print('model init finished ......')
  53. return
  54. '''分类主函数'''
  55. def classify(self, question):
  56. data = {}
  57. # # check_medical 是定义在后面的函数
  58. # 搜寻最终提取词的信息 比如{'感冒‘:’diseases‘.....}
  59. medical_dict = self.check_medical(question)
  60. if not medical_dict:
  61. return {}
  62. data['args'] = medical_dict
  63. #收集问句当中所涉及到的实体类型
  64. types = []
  65. for type_ in medical_dict.values():
  66. types += type_
  67. question_type = 'others'
  68. question_types = []
  69. # 症状
  70. if self.check_words(self.symptom_qwds, question) and ('disease' in types):
  71. question_type = 'disease_symptom'
  72. question_types.append(question_type)
  73. if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
  74. question_type = 'symptom_disease'
  75. question_types.append(question_type)
  76. # 原因
  77. if self.check_words(self.cause_qwds, question) and ('disease' in types):
  78. question_type = 'disease_cause'
  79. question_types.append(question_type)
  80. # 并发症
  81. if self.check_words(self.acompany_qwds, question) and ('disease' in types):
  82. question_type = 'disease_acompany'
  83. question_types.append(question_type)
  84. # 推荐食品
  85. if self.check_words(self.food_qwds, question) and 'disease' in types:
  86. deny_status = self.check_words(self.deny_words, question)
  87. if deny_status:
  88. question_type = 'disease_not_food'
  89. else:
  90. question_type = 'disease_do_food'
  91. question_types.append(question_type)
  92. #已知食物找疾病
  93. if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
  94. deny_status = self.check_words(self.deny_words, question)
  95. if deny_status:
  96. question_type = 'food_not_disease'
  97. else:
  98. question_type = 'food_do_disease'
  99. question_types.append(question_type)
  100. # 推荐药品
  101. if self.check_words(self.drug_qwds, question) and 'disease' in types:
  102. question_type = 'disease_drug'
  103. question_types.append(question_type)
  104. # 药品治啥病
  105. if self.check_words(self.cure_qwds, question) and 'drug' in types:
  106. question_type = 'drug_disease'
  107. question_types.append(question_type)
  108. # 疾病接受检查项目
  109. if self.check_words(self.check_qwds, question) and 'disease' in types:
  110. question_type = 'disease_check'
  111. question_types.append(question_type)
  112. # 已知检查项目查相应疾病
  113. if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
  114. question_type = 'check_disease'
  115. question_types.append(question_type)
  116. # 症状防御
  117. if self.check_words(self.prevent_qwds, question) and 'disease' in types:
  118. question_type = 'disease_prevent'
  119. question_types.append(question_type)
  120. # 疾病医疗周期
  121. if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
  122. question_type = 'disease_lasttime'
  123. question_types.append(question_type)
  124. # 疾病治疗方式
  125. if self.check_words(self.cureway_qwds, question) and 'disease' in types:
  126. question_type = 'disease_cureway'
  127. question_types.append(question_type)
  128. # 疾病治愈可能性
  129. if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
  130. question_type = 'disease_cureprob'
  131. question_types.append(question_type)
  132. # 疾病易感染人群
  133. if self.check_words(self.easyget_qwds, question) and 'disease' in types :
  134. question_type = 'disease_easyget'
  135. question_types.append(question_type)
  136. # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
  137. if question_types == [] and 'disease' in types:
  138. question_types = ['disease_desc']
  139. # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
  140. if question_types == [] and 'symptom' in types:
  141. question_types = ['symptom_disease']
  142. # 将多个分类结果进行合并处理,组装成一个字典
  143. data['question_types'] = question_types
  144. return data
  145. '''构造词对应的类型
  146. 根据7类实体构造{特征词:特征词对应类型}词典。
  147. 存储region_word中对应词汇的类型(疾病、科室)
  148. '''
  149. def build_wdtype_dict(self):
  150. wd_dict = dict()
  151. # region_words包含了一系列信息
  152. for wd in self.region_words:
  153. wd_dict[wd] = []
  154. #查询 关键词 是否在对应的列表中存在,若存在则添加,不存在返回空
  155. if wd in self.disease_wds:
  156. wd_dict[wd].append('disease')
  157. if wd in self.department_wds:
  158. wd_dict[wd].append('department')
  159. if wd in self.check_wds:
  160. wd_dict[wd].append('check')
  161. if wd in self.drug_wds:
  162. wd_dict[wd].append('drug')
  163. if wd in self.food_wds:
  164. wd_dict[wd].append('food')
  165. if wd in self.symptom_wds:
  166. wd_dict[wd].append('symptom')
  167. if wd in self.producer_wds:
  168. wd_dict[wd].append('producer')
  169. return wd_dict
  170. #构造actree,加速过滤
  171. #该函数构建领域actree,加速过滤。通过python的ahocorasick库实现。
  172. #ahocorasick是一种字符串匹配算法,由两种数据结构实现:trie和Aho-Corasick自动机。
  173. #Trie是一个字符串索引的词典,检索相关项时时间和字符串长度成正比。
  174. #AC自动机能够在一次运行中找到给定集合所有字符串。AC自动机其实就是在Trie树上实现KMP,
  175. #可以完成多模式串的匹配。
  176. #具体ahocorasick用法非本文重点,
  177. #可参考https://blog.csdn.net/pirage/article/details/51657178等博文。
  178. #类似KMP,快速匹配
  179. def build_actree(self, wordlist):
  180. actree = ahocorasick.Automaton()#初始化trie树
  181. for index, word in enumerate(wordlist):
  182. actree.add_word(word, (index, word))#向trie树中添加单词
  183. actree.make_automaton()#将trie树转化成Aho-Corasick
  184. return actree
  185. #问句过滤
  186. #通过ahocorasick库的iter()函数匹配领域词,将有重复字符串的领域词去除短的,
  187. # 取最长的领域词返回。功能为过滤问句中含有的领域词,
  188. # 返回{问句中的领域词:词所对应的实体类型}。
  189. # 思路
  190. #1.初始化
  191. #词典:疾病、科室、检查项目、药物、食物、具体品牌的药、症状、表否定意义的词以及一个拥有全部词语的词典region_word
  192. #把region_word中所有的词取出构造actree(为了加快后面的搜索速度):region_tree
  193. #新建一个词典wdtype_dict,存储region_word中对应词汇的类型(疾病、科室...)
  194. #构造同义词词典,便于理解用户意思,适应不同的表述方法
  195. #2.分析用户的问题
  196. #问句过滤(过滤出用户提到的领域内信息):通过region_tree查找出所有在词典region_word中出现的关键词,并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
  197. #问题分类(判断用户具体已知什么求什么):通过同义词表和wdtype_dict关键词词典判断出用户的具体问题
  198. #原文链接:https://blog.csdn.net/floracuu/article/details/113574130
  199. #问句过滤(过滤出用户提到的领域信息)通过region_tree查找出所有在词典region_word中出现的关键词
  200. #并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
  201. def check_medical(self, question):
  202. region_wds = []
  203. # region_tree 是一棵用region_wds 做出来的actree,快速找出question与之匹配的实体
  204. # 但是有时候匹配的结果与我们想的不一,比如“瓜烧白菜”和“白菜”是不一样的
  205. # 通过ahocorasick库的iter()函数匹配领域词
  206. # # ahocorasick库 匹配问题 iter返回一个元组,i的形式如(3, (23192, '乙肝'))
  207. for i in self.region_tree.iter(question):
  208. #wd是question用actree作了加速
  209. wd = i[1][1] #匹配到的词
  210. region_wds.append(wd)
  211. #利用停用词过滤
  212. stop_wds = []
  213. for wd1 in region_wds:
  214. for wd2 in region_wds:
  215. #如果词语不一样,则添加较长的
  216. ##判断每对儿词之间的关系,选择更详细的加入词典
  217. #比如“内科”in“消化内科”,并且!=
  218. if wd1 in wd2 and wd1 != wd2:
  219. stop_wds.append(wd1)#取短词
  220. #更新最后剩下的词语组合
  221. final_wds = [i for i in region_wds if i not in stop_wds]#取长词
  222. # 更新字典,格式比如{'感冒':'disease'....}
  223. final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}
  224. return final_dict
  225. #基于特征词进行分类
  226. #该函数检查问句中是否含有某实体类型内的特征词。
  227. def check_words(self, wds, sent):
  228. for wd in wds:
  229. if wd in sent:
  230. return True
  231. return False
  232. if __name__ == '__main__':
  233. handler = QuestionClassifier()
  234. #问题输入到分类过程
  235. while 1:
  236. question = input('input an question:')
  237. data = handler.classify(question)
  238. print(data)

问句解析

  1. #将用户问题转换成neo4j的查询语句
  2. #1.将提取出的问题关键词按照类型合并
  3. #2.循环取出问题字段,将其翻译成neo4j查询语句
  4. """
  5. parser_main函数
  6. 该函数为问句解析主函数。
  7. 首先传入问句分类结果,获取问句中领域词及其实体类型。
  8. 接着调用build_entitydict函数,返回形如{'实体类型':['领域词'],...}的entity_dict字典。
  9. 然后对问句分类返回值中[‘question_types’]的每一个question_type,
  10. 调用sql_transfer函数转换为neo4j的Cypher语言。
  11. 最后组合每种question_type转换后的sql查询语句。
  12. 原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
  13. """
  14. class QuestionPaser:
  15. # 如: args={'青光眼': ['disease'], '肺气肿': ['disease'], '消化内科': ['department']}
  16. # 合并后: entity_dict= {'disease': ['青光眼', '肺气肿'], 'department': ['消化内科']}
  17. #原文链接:https: // blog.csdn.net / floracuu / article / details / 113828998
  18. '''构建实体节点'''
  19. def build_entitydict(self, args):
  20. #args 实质是将函数传入的参数存储在元组类型的变量args中
  21. entity_dict = {}
  22. #键值和类型
  23. for arg, types in args.items():
  24. for type in types:
  25. if type not in entity_dict:
  26. entity_dict[type] = [arg]
  27. else:
  28. entity_dict[type].append(arg)
  29. return entity_dict
  30. '''解析主函数'''
  31. def parser_main(self, res_classify):
  32. # 取到关键词
  33. args = res_classify['args']
  34. # 合并同类型的字段
  35. entity_dict = self.build_entitydict(args)
  36. question_types = res_classify['question_types']
  37. sqls = []
  38. # 取到所有的问题类型,并且将问题类型转换为对应的sql语句,每次通过sql_{}转换为词典全部存入sqls[]
  39. # 其中sql_{}中一共有两个字段question_types和sql
  40. for question_type in question_types:
  41. sql_ = {}#变量后带下划线避免与系统关键词冲突。
  42. sql_['question_type'] = question_type
  43. sql = []
  44. if question_type == 'disease_symptom':
  45. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  46. elif question_type == 'symptom_disease':
  47. sql = self.sql_transfer(question_type, entity_dict.get('symptom'))
  48. elif question_type == 'disease_cause':
  49. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  50. elif question_type == 'disease_acompany':
  51. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  52. elif question_type == 'disease_not_food':
  53. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  54. elif question_type == 'disease_do_food':
  55. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  56. elif question_type == 'food_not_disease':
  57. sql = self.sql_transfer(question_type, entity_dict.get('food'))
  58. elif question_type == 'food_do_disease':
  59. sql = self.sql_transfer(question_type, entity_dict.get('food'))
  60. elif question_type == 'disease_drug':
  61. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  62. elif question_type == 'drug_disease':
  63. sql = self.sql_transfer(question_type, entity_dict.get('drug'))
  64. elif question_type == 'disease_check':
  65. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  66. elif question_type == 'check_disease':
  67. sql = self.sql_transfer(question_type, entity_dict.get('check'))
  68. elif question_type == 'disease_prevent':
  69. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  70. elif question_type == 'disease_lasttime':
  71. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  72. elif question_type == 'disease_cureway':
  73. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  74. elif question_type == 'disease_cureprob':
  75. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  76. elif question_type == 'disease_easyget':
  77. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  78. elif question_type == 'disease_desc':
  79. sql = self.sql_transfer(question_type, entity_dict.get('disease'))
  80. if sql:
  81. sql_['sql'] = sql
  82. sqls.append(sql_)
  83. return sqls
  84. '''针对不同的问题,翻译成Neo4j的SQL语句'''
  85. def sql_transfer(self, question_type, entities):
  86. if not entities:
  87. return []
  88. # 查询语句
  89. sql = []
  90. # 查询疾病的原因
  91. if question_type == 'disease_cause':
  92. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cause".format(i) for i in entities]
  93. # 查询疾病的防御措施
  94. elif question_type == 'disease_prevent':
  95. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.prevent".format(i) for i in entities]
  96. # 查询疾病的持续时间
  97. elif question_type == 'disease_lasttime':
  98. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_lasttime".format(i) for i in entities]
  99. # 查询疾病的治愈概率
  100. elif question_type == 'disease_cureprob':
  101. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cured_prob".format(i) for i in entities]
  102. # 查询疾病的治疗方式
  103. elif question_type == 'disease_cureway':
  104. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_way".format(i) for i in entities]
  105. # 查询疾病的易发人群
  106. elif question_type == 'disease_easyget':
  107. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.easy_get".format(i) for i in entities]
  108. # 查询疾病的相关介绍
  109. elif question_type == 'disease_desc':
  110. sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.desc".format(i) for i in entities]
  111. # 查询疾病有哪些症状
  112. elif question_type == 'disease_symptom':
  113. sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  114. # 查询症状会导致哪些疾病
  115. elif question_type == 'symptom_disease':
  116. sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  117. # 查询疾病的并发症
  118. elif question_type == 'disease_acompany':
  119. sql1 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  120. sql2 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  121. sql = sql1 + sql2
  122. # 查询疾病的忌口
  123. elif question_type == 'disease_not_food':
  124. sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  125. # 查询疾病建议吃的东西
  126. elif question_type == 'disease_do_food':
  127. sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  128. sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  129. sql = sql1 + sql2
  130. # 已知忌口查疾病
  131. elif question_type == 'food_not_disease':
  132. sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  133. # 已知推荐查疾病
  134. elif question_type == 'food_do_disease':
  135. sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  136. sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  137. sql = sql1 + sql2
  138. # 查询疾病常用药品-药品别名记得扩充
  139. elif question_type == 'disease_drug':
  140. sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  141. sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  142. sql = sql1 + sql2
  143. # 已知药品查询能够治疗的疾病
  144. elif question_type == 'drug_disease':
  145. sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  146. sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  147. sql = sql1 + sql2
  148. # 查询疾病应该进行的检查
  149. elif question_type == 'disease_check':
  150. sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  151. # 已知检查查询疾病
  152. elif question_type == 'check_disease':
  153. sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
  154. return sql
  155. #用cypher语句搜索问题类型,将找到的信息以python模式添加到答案里。
  156. if __name__ == '__main__':
  157. handler = QuestionPaser()

解析后的结果查询

  1. """
  2. 问句解析之后需要对解析后的结果进行查询。
  3. 该脚本创建了一个AnswerSearcher类。与build_medicalgraph.py类似,
  4. 该类定义了Graph类的成员变量g和返回答案列举的最大个数num_list。
  5. 该类的成员函数有两个,一个查询主函数一个回复模块。
  6. search_main函数
  7. 传入问题解析的结果sqls,将保存在queries里的[‘question_type’]和[‘sql’]分别取出。
  8. 首先调用self.g.run(query).data()函数执行[‘sql’]中的查询语句得到查询结果,
  9. 再根据[‘question_type’]的不同调用answer_prettify函数将查询结果和答案话术结合起来。
  10. 最后返回最终的答案。
  11. answer_prettify函数
  12. 该函数根据对应的qustion_type,调用相应的回复模板。
  13. 原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
  14. """
  15. """
  16. 执行neo4j查询语句并拼接成自然语言
  17. """
  18. from py2neo import Graph
  19. class AnswerSearcher:
  20. #链接数据库
  21. def __init__(self):
  22. self.g = Graph(
  23. host="127.0.0.1",
  24. http_port=7474,
  25. user="neo4j",
  26. password="101827bdx")
  27. self.num_limit = 20
  28. '''执行cypher查询,并返回相应结果'''
  29. def search_main(self, sqls):
  30. final_answers = []
  31. for sql_ in sqls:
  32. question_type = sql_['question_type']
  33. queries = sql_['sql']
  34. answers = []
  35. for query in queries:
  36. #执行sql语句
  37. ress = self.g.run(query).data()
  38. answers += ress
  39. #传过去当前问题和当前问题的所有回答
  40. final_answer = self.answer_prettify(question_type, answers)
  41. if final_answer:
  42. final_answers.append(final_answer)
  43. return final_answers
  44. '''根据对应的qustion_type,调用相应的回复模板'''
  45. def answer_prettify(self, question_type, answers):
  46. final_answer = []
  47. if not answers:
  48. return ''
  49. if question_type == 'disease_symptom':
  50. # 根据上文,m代表疾病,n代表查询另一端结点,此处是症状
  51. desc = [i['n.name'] for i in answers]
  52. # {0}{1}代表format函数中变量的位置
  53. # set方法是对元素进行去重,处理之后是一个字典形式,使用list是将其转化为列表
  54. # 将症状去重化为列表,将列表中所有项通过分号连接成完整的部分
  55. subject = answers[0]['m.name']
  56. final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  57. elif question_type == 'symptom_disease':
  58. desc = [i['m.name'] for i in answers]
  59. subject = answers[0]['n.name']
  60. final_answer = '症状{0}可能染上的疾病有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  61. elif question_type == 'disease_cause':
  62. desc = [i['m.cause'] for i in answers]
  63. subject = answers[0]['m.name']
  64. final_answer = '{0}可能的成因有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  65. elif question_type == 'disease_prevent':
  66. desc = [i['m.prevent'] for i in answers]
  67. subject = answers[0]['m.name']
  68. final_answer = '{0}的预防措施包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  69. elif question_type == 'disease_lasttime':
  70. desc = [i['m.cure_lasttime'] for i in answers]
  71. subject = answers[0]['m.name']
  72. final_answer = '{0}治疗可能持续的周期为:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  73. elif question_type == 'disease_cureway':
  74. desc = [';'.join(i['m.cure_way']) for i in answers]
  75. subject = answers[0]['m.name']
  76. final_answer = '{0}可以尝试如下治疗:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  77. elif question_type == 'disease_cureprob':
  78. desc = [i['m.cured_prob'] for i in answers]
  79. subject = answers[0]['m.name']
  80. final_answer = '{0}治愈的概率为(仅供参考):{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  81. elif question_type == 'disease_easyget':
  82. desc = [i['m.easy_get'] for i in answers]
  83. subject = answers[0]['m.name']
  84. final_answer = '{0}的易感人群包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  85. elif question_type == 'disease_desc':
  86. desc = [i['m.desc'] for i in answers]
  87. subject = answers[0]['m.name']
  88. final_answer = '{0},熟悉一下:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  89. elif question_type == 'disease_acompany':
  90. desc1 = [i['n.name'] for i in answers]
  91. desc2 = [i['m.name'] for i in answers]
  92. subject = answers[0]['m.name']
  93. desc = [i for i in desc1 + desc2 if i != subject]
  94. final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  95. elif question_type == 'disease_not_food':
  96. desc = [i['n.name'] for i in answers]
  97. subject = answers[0]['m.name']
  98. final_answer = '{0}忌食的食物包括有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  99. elif question_type == 'disease_do_food':
  100. do_desc = [i['n.name'] for i in answers if i['r.name'] == '宜吃']
  101. recommand_desc = [i['n.name'] for i in answers if i['r.name'] == '推荐食谱']
  102. subject = answers[0]['m.name']
  103. final_answer = '{0}宜食的食物包括有:{1}\n推荐食谱包括有:{2}'.format(subject, ';'.join(list(set(do_desc))[:self.num_limit]), ';'.join(list(set(recommand_desc))[:self.num_limit]))
  104. elif question_type == 'food_not_disease':
  105. desc = [i['m.name'] for i in answers]
  106. subject = answers[0]['n.name']
  107. final_answer = '患有{0}的人最好不要吃{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
  108. elif question_type == 'food_do_disease':
  109. desc = [i['m.name'] for i in answers]
  110. subject = answers[0]['n.name']
  111. final_answer = '患有{0}的人建议多试试{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
  112. elif question_type == 'disease_drug':
  113. desc = [i['n.name'] for i in answers]
  114. subject = answers[0]['m.name']
  115. final_answer = '{0}通常的使用的药品包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  116. elif question_type == 'drug_disease':
  117. desc = [i['m.name'] for i in answers]
  118. subject = answers[0]['n.name']
  119. final_answer = '{0}主治的疾病有{1},可以试试'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  120. elif question_type == 'disease_check':
  121. desc = [i['n.name'] for i in answers]
  122. subject = answers[0]['m.name']
  123. final_answer = '{0}通常可以通过以下方式检查出来:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  124. elif question_type == 'check_disease':
  125. desc = [i['m.name'] for i in answers]
  126. subject = answers[0]['n.name']
  127. final_answer = '通常可以通过{0}检查出来的疾病有{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
  128. return final_answer
  129. if __name__ == '__main__':
  130. searcher = AnswerSearcher()

(3)本项目的问答系统完全基于规则匹配实现,通过关键词匹配,对问句进行分类, #医疗问题本身属于封闭域类场景,对领域问题进行穷举并分类, 然后使用cypher的match去匹配查找neo4j,根据返回数据组装问句回答,最后返回结果。  问答框架的构建是通过chatbot_graph.py、answer_search.py、 # question_classifier.py、question_parser.py等脚本实现。

资料链接直通:


B站讲解视频

基于医疗知识图谱的问答系统

原项目链接地址

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/weixin_40725706/article/detail/580078
推荐阅读
相关标签
  

闽ICP备14008679号