赞
踩
本项目下载的是中科院刘焕勇的源码
https://github.com/liuhuanyong/QASystemOnMedicalKG
下载后如何运行的步骤方法:
(1)安装neo4j数据库以及相应的包,安装Neo4j时要先安装JDKjava开发工具包。要注意使用的版本问题,Neo4j是版本4的,Java是1.8版本的,在本项目中使用的是py2neo=4.3.0版本的数据包,太高不可以运行。
以下是关于安装Neo4j的相关链接以及基础了解:
https://so.csdn.net/so/search?q=neo4j&spm=1001.2101.3001.7020
(2)python 安装py2neo和pyahocorasick包,安装pyahocorasick的时候报错,提示安装Visual Studio Build Tools:
先安装 Microsoft Visual C++ :在 https://visualstudio.microsoft.com/downloads/ 下载Build Tools, 安装后,在模块选择里勾选Visual Studio Build Tools里面的C++ Build Tools。
有的人说直接用anaconda安装pyahocorasick不需要安装VC,具体我没试过。
(3)接着运行程序:
1)先修改build_medicalgraph和answer_search的user和password,改成你的neo4j的账号名和密码
2)然后在build_medicalgraph的最后两行添加:
handler.create_graphnodes()
handler.create_graphrels()
3)运行build_medicalgraph,有的可能会报错:
UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0xaf in position 81: illegal multibyte sequence.
把有open的地方加上encoding=‘utf-8’
4)数据很多,会运行几个小时,运行完之后打开neo4j explore,就有节点和图
5)再运行chatbot_graph.py,输入你想问的问题,就会出来答案
关于模型代码的解析:
(1)对于知识图谱的构建,首先是数据的获取,数据主要是通过爬虫获取到的,且是结构化数据,对于半结构化数据无需从句子或文章中进行知识抽取等相关操作,最终本文主要是通过将数据保存成json格式使用数据。构建数据这部分主要是构建实体类型,属性以及关系的相关操作,源代码中有相应的注解,就不在此贴出相关的代码解释了。代码还包括了问句的分类、解析、对解析结果的查询以及返回查询问句结果几部分,代码包括自己的理解,如有其他见解或错误请提出,仅代表我个人的理解。
(2)部分代码片段
问句分类部分
- import os
- import ahocorasick
- #自动机
- #可实现自动批量匹配字符串的作用,即可一次返回该条字符串中命中的所有关键词
-
- class QuestionClassifier:
- def __init__(self):
- #cur_dir 是当前目录,其中[:-1]可以达到返回上一层的效果
- #获取的绝对路径os.path.abspath(__file__)
- cur_dir = '/'.join(os.path.abspath(__file__).split('/')[:-1])
- # 特征词路径
- self.disease_path = os.path.join(cur_dir, 'dict/disease.txt')
- self.department_path = os.path.join(cur_dir, 'dict/department.txt')
- self.check_path = os.path.join(cur_dir, 'dict/check.txt')
- self.drug_path = os.path.join(cur_dir, 'dict/drug.txt')
- self.food_path = os.path.join(cur_dir, 'dict/food.txt')
- self.producer_path = os.path.join(cur_dir, 'dict/producer.txt')
- self.symptom_path = os.path.join(cur_dir, 'dict/symptom.txt')
- self.deny_path = os.path.join(cur_dir, 'dict/deny.txt')
- # 加载特征词,七类词包括七种实体部分的词和构建的领域词和一些否定词
- self.disease_wds= [i.strip() for i in open(self.disease_path,encoding='utf-8') if i.strip()]
- self.department_wds= [i.strip() for i in open(self.department_path,encoding='utf-8') if i.strip()]
- self.check_wds= [i.strip() for i in open(self.check_path,encoding='utf-8') if i.strip()]
- self.drug_wds= [i.strip() for i in open(self.drug_path,encoding='utf-8') if i.strip()]
- self.food_wds= [i.strip() for i in open(self.food_path,encoding='utf-8') if i.strip()]
- self.producer_wds= [i.strip() for i in open(self.producer_path,encoding='utf-8') if i.strip()]
- self.symptom_wds= [i.strip() for i in open(self.symptom_path,encoding='utf-8') if i.strip()]
- self.region_words = set(self.department_wds + self.disease_wds + self.check_wds + self.drug_wds + self.food_wds + self.producer_wds + self.symptom_wds)
- self.deny_words = [i.strip() for i in open(self.deny_path,encoding='utf-8') if i.strip()]
- # 构造领域actree
- self.region_tree = self.build_actree(list(self.region_words))
- # 构建词典-格式比如{'感冒':'disease'....}
- self.wdtype_dict = self.build_wdtype_dict()
- # 问句疑问词,问句疑问包含了疾病的属性和边相关的问题词
- self.symptom_qwds = ['症状', '表征', '现象', '症候', '表现']
- self.cause_qwds = ['原因','成因', '为什么', '怎么会', '怎样才', '咋样才', '怎样会', '如何会', '为啥', '为何', '如何才会', '怎么才会', '会导致', '会造成']
- self.acompany_qwds = ['并发症', '并发', '一起发生', '一并发生', '一起出现', '一并出现', '一同发生', '一同出现', '伴随发生', '伴随', '共现']
- self.food_qwds = ['饮食', '饮用', '吃', '食', '伙食', '膳食', '喝', '菜' ,'忌口', '补品', '保健品', '食谱', '菜谱', '食用', '食物','补品']
- self.drug_qwds = ['药', '药品', '用药', '胶囊', '口服液', '炎片']
- self.prevent_qwds = ['预防', '防范', '抵制', '抵御', '防止','躲避','逃避','避开','免得','逃开','避开','避掉','躲开','躲掉','绕开',
- '怎样才能不', '怎么才能不', '咋样才能不','咋才能不', '如何才能不',
- '怎样才不', '怎么才不', '咋样才不','咋才不', '如何才不',
- '怎样才可以不', '怎么才可以不', '咋样才可以不', '咋才可以不', '如何可以不',
- '怎样才可不', '怎么才可不', '咋样才可不', '咋才可不', '如何可不']
- self.lasttime_qwds = ['周期', '多久', '多长时间', '多少时间', '几天', '几年', '多少天', '多少小时', '几个小时', '多少年']
- self.cureway_qwds = ['怎么治疗', '如何医治', '怎么医治', '怎么治', '怎么医', '如何治', '医治方式', '疗法', '咋治', '怎么办', '咋办', '咋治']
- self.cureprob_qwds = ['多大概率能治好', '多大几率能治好', '治好希望大么', '几率', '几成', '比例', '可能性', '能治', '可治', '可以治', '可以医']
- self.easyget_qwds = ['易感人群', '容易感染', '易发人群', '什么人', '哪些人', '感染', '染上', '得上']
- self.check_qwds = ['检查', '检查项目', '查出', '检查', '测出', '试出']
- self.belong_qwds = ['属于什么科', '属于', '什么科', '科室']
- self.cure_qwds = ['治疗什么', '治啥', '治疗啥', '医治啥', '治愈啥', '主治啥', '主治什么', '有什么用', '有何用', '用处', '用途',
- '有什么好处', '有什么益处', '有何益处', '用来', '用来做啥', '用来作甚', '需要', '要']
-
- print('model init finished ......')
-
- return
-
- '''分类主函数'''
- def classify(self, question):
- data = {}
- # # check_medical 是定义在后面的函数
- # 搜寻最终提取词的信息 比如{'感冒‘:’diseases‘.....}
- medical_dict = self.check_medical(question)
- if not medical_dict:
- return {}
- data['args'] = medical_dict
- #收集问句当中所涉及到的实体类型
- types = []
- for type_ in medical_dict.values():
- types += type_
- question_type = 'others'
-
- question_types = []
-
- # 症状
- if self.check_words(self.symptom_qwds, question) and ('disease' in types):
- question_type = 'disease_symptom'
- question_types.append(question_type)
-
- if self.check_words(self.symptom_qwds, question) and ('symptom' in types):
- question_type = 'symptom_disease'
- question_types.append(question_type)
-
- # 原因
- if self.check_words(self.cause_qwds, question) and ('disease' in types):
- question_type = 'disease_cause'
- question_types.append(question_type)
- # 并发症
- if self.check_words(self.acompany_qwds, question) and ('disease' in types):
- question_type = 'disease_acompany'
- question_types.append(question_type)
-
- # 推荐食品
- if self.check_words(self.food_qwds, question) and 'disease' in types:
- deny_status = self.check_words(self.deny_words, question)
- if deny_status:
- question_type = 'disease_not_food'
- else:
- question_type = 'disease_do_food'
- question_types.append(question_type)
-
- #已知食物找疾病
- if self.check_words(self.food_qwds+self.cure_qwds, question) and 'food' in types:
- deny_status = self.check_words(self.deny_words, question)
- if deny_status:
- question_type = 'food_not_disease'
- else:
- question_type = 'food_do_disease'
- question_types.append(question_type)
-
- # 推荐药品
- if self.check_words(self.drug_qwds, question) and 'disease' in types:
- question_type = 'disease_drug'
- question_types.append(question_type)
-
- # 药品治啥病
- if self.check_words(self.cure_qwds, question) and 'drug' in types:
- question_type = 'drug_disease'
- question_types.append(question_type)
-
- # 疾病接受检查项目
- if self.check_words(self.check_qwds, question) and 'disease' in types:
- question_type = 'disease_check'
- question_types.append(question_type)
-
- # 已知检查项目查相应疾病
- if self.check_words(self.check_qwds+self.cure_qwds, question) and 'check' in types:
- question_type = 'check_disease'
- question_types.append(question_type)
-
- # 症状防御
- if self.check_words(self.prevent_qwds, question) and 'disease' in types:
- question_type = 'disease_prevent'
- question_types.append(question_type)
-
- # 疾病医疗周期
- if self.check_words(self.lasttime_qwds, question) and 'disease' in types:
- question_type = 'disease_lasttime'
- question_types.append(question_type)
-
- # 疾病治疗方式
- if self.check_words(self.cureway_qwds, question) and 'disease' in types:
- question_type = 'disease_cureway'
- question_types.append(question_type)
-
- # 疾病治愈可能性
- if self.check_words(self.cureprob_qwds, question) and 'disease' in types:
- question_type = 'disease_cureprob'
- question_types.append(question_type)
-
- # 疾病易感染人群
- if self.check_words(self.easyget_qwds, question) and 'disease' in types :
- question_type = 'disease_easyget'
- question_types.append(question_type)
-
- # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
- if question_types == [] and 'disease' in types:
- question_types = ['disease_desc']
-
- # 若没有查到相关的外部查询信息,那么则将该疾病的描述信息返回
- if question_types == [] and 'symptom' in types:
- question_types = ['symptom_disease']
-
- # 将多个分类结果进行合并处理,组装成一个字典
- data['question_types'] = question_types
-
- return data
-
- '''构造词对应的类型
- 根据7类实体构造{特征词:特征词对应类型}词典。
- 存储region_word中对应词汇的类型(疾病、科室)
- '''
- def build_wdtype_dict(self):
- wd_dict = dict()
- # region_words包含了一系列信息
- for wd in self.region_words:
- wd_dict[wd] = []
- #查询 关键词 是否在对应的列表中存在,若存在则添加,不存在返回空
- if wd in self.disease_wds:
- wd_dict[wd].append('disease')
- if wd in self.department_wds:
- wd_dict[wd].append('department')
- if wd in self.check_wds:
- wd_dict[wd].append('check')
- if wd in self.drug_wds:
- wd_dict[wd].append('drug')
- if wd in self.food_wds:
- wd_dict[wd].append('food')
- if wd in self.symptom_wds:
- wd_dict[wd].append('symptom')
- if wd in self.producer_wds:
- wd_dict[wd].append('producer')
- return wd_dict
-
- #构造actree,加速过滤
- #该函数构建领域actree,加速过滤。通过python的ahocorasick库实现。
- #ahocorasick是一种字符串匹配算法,由两种数据结构实现:trie和Aho-Corasick自动机。
- #Trie是一个字符串索引的词典,检索相关项时时间和字符串长度成正比。
- #AC自动机能够在一次运行中找到给定集合所有字符串。AC自动机其实就是在Trie树上实现KMP,
- #可以完成多模式串的匹配。
- #具体ahocorasick用法非本文重点,
- #可参考https://blog.csdn.net/pirage/article/details/51657178等博文。
- #类似KMP,快速匹配
-
- def build_actree(self, wordlist):
- actree = ahocorasick.Automaton()#初始化trie树
- for index, word in enumerate(wordlist):
- actree.add_word(word, (index, word))#向trie树中添加单词
- actree.make_automaton()#将trie树转化成Aho-Corasick
- return actree
-
-
- #问句过滤
- #通过ahocorasick库的iter()函数匹配领域词,将有重复字符串的领域词去除短的,
- # 取最长的领域词返回。功能为过滤问句中含有的领域词,
- # 返回{问句中的领域词:词所对应的实体类型}。
-
-
- # 思路
- #1.初始化
- #词典:疾病、科室、检查项目、药物、食物、具体品牌的药、症状、表否定意义的词以及一个拥有全部词语的词典region_word
- #把region_word中所有的词取出构造actree(为了加快后面的搜索速度):region_tree
- #新建一个词典wdtype_dict,存储region_word中对应词汇的类型(疾病、科室...)
- #构造同义词词典,便于理解用户意思,适应不同的表述方法
- #2.分析用户的问题
- #问句过滤(过滤出用户提到的领域内信息):通过region_tree查找出所有在词典region_word中出现的关键词,并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
- #问题分类(判断用户具体已知什么求什么):通过同义词表和wdtype_dict关键词词典判断出用户的具体问题
- #原文链接:https://blog.csdn.net/floracuu/article/details/113574130
-
- #问句过滤(过滤出用户提到的领域信息)通过region_tree查找出所有在词典region_word中出现的关键词
- #并且过滤掉更广泛的关键词,并且通过wdtype_dict给出关键词所属的词典。
- def check_medical(self, question):
- region_wds = []
- # region_tree 是一棵用region_wds 做出来的actree,快速找出question与之匹配的实体
- # 但是有时候匹配的结果与我们想的不一,比如“瓜烧白菜”和“白菜”是不一样的
- # 通过ahocorasick库的iter()函数匹配领域词
- # # ahocorasick库 匹配问题 iter返回一个元组,i的形式如(3, (23192, '乙肝'))
- for i in self.region_tree.iter(question):
- #wd是question用actree作了加速
- wd = i[1][1] #匹配到的词
- region_wds.append(wd)
- #利用停用词过滤
- stop_wds = []
- for wd1 in region_wds:
- for wd2 in region_wds:
- #如果词语不一样,则添加较长的
- ##判断每对儿词之间的关系,选择更详细的加入词典
- #比如“内科”in“消化内科”,并且!=
- if wd1 in wd2 and wd1 != wd2:
- stop_wds.append(wd1)#取短词
- #更新最后剩下的词语组合
- final_wds = [i for i in region_wds if i not in stop_wds]#取长词
- # 更新字典,格式比如{'感冒':'disease'....}
- final_dict = {i:self.wdtype_dict.get(i) for i in final_wds}
-
- return final_dict
-
-
- #基于特征词进行分类
- #该函数检查问句中是否含有某实体类型内的特征词。
-
- def check_words(self, wds, sent):
- for wd in wds:
- if wd in sent:
- return True
- return False
-
-
- if __name__ == '__main__':
- handler = QuestionClassifier()
- #问题输入到分类过程
- while 1:
- question = input('input an question:')
- data = handler.classify(question)
- print(data)
问句解析
- #将用户问题转换成neo4j的查询语句
-
- #1.将提取出的问题关键词按照类型合并
- #2.循环取出问题字段,将其翻译成neo4j查询语句
- """
- parser_main函数
- 该函数为问句解析主函数。
- 首先传入问句分类结果,获取问句中领域词及其实体类型。
- 接着调用build_entitydict函数,返回形如{'实体类型':['领域词'],...}的entity_dict字典。
- 然后对问句分类返回值中[‘question_types’]的每一个question_type,
- 调用sql_transfer函数转换为neo4j的Cypher语言。
- 最后组合每种question_type转换后的sql查询语句。
- 原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
- """
- class QuestionPaser:
- # 如: args={'青光眼': ['disease'], '肺气肿': ['disease'], '消化内科': ['department']}
- # 合并后: entity_dict= {'disease': ['青光眼', '肺气肿'], 'department': ['消化内科']}
- #原文链接:https: // blog.csdn.net / floracuu / article / details / 113828998
- '''构建实体节点'''
- def build_entitydict(self, args):
- #args 实质是将函数传入的参数存储在元组类型的变量args中
- entity_dict = {}
- #键值和类型
- for arg, types in args.items():
- for type in types:
- if type not in entity_dict:
- entity_dict[type] = [arg]
- else:
- entity_dict[type].append(arg)
-
- return entity_dict
-
- '''解析主函数'''
- def parser_main(self, res_classify):
- # 取到关键词
- args = res_classify['args']
- # 合并同类型的字段
- entity_dict = self.build_entitydict(args)
- question_types = res_classify['question_types']
- sqls = []
-
- # 取到所有的问题类型,并且将问题类型转换为对应的sql语句,每次通过sql_{}转换为词典全部存入sqls[]
- # 其中sql_{}中一共有两个字段question_types和sql
-
- for question_type in question_types:
- sql_ = {}#变量后带下划线避免与系统关键词冲突。
- sql_['question_type'] = question_type
- sql = []
- if question_type == 'disease_symptom':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'symptom_disease':
- sql = self.sql_transfer(question_type, entity_dict.get('symptom'))
-
- elif question_type == 'disease_cause':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_acompany':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_not_food':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_do_food':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'food_not_disease':
- sql = self.sql_transfer(question_type, entity_dict.get('food'))
-
- elif question_type == 'food_do_disease':
- sql = self.sql_transfer(question_type, entity_dict.get('food'))
-
- elif question_type == 'disease_drug':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'drug_disease':
- sql = self.sql_transfer(question_type, entity_dict.get('drug'))
-
- elif question_type == 'disease_check':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'check_disease':
- sql = self.sql_transfer(question_type, entity_dict.get('check'))
-
- elif question_type == 'disease_prevent':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_lasttime':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_cureway':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_cureprob':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_easyget':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- elif question_type == 'disease_desc':
- sql = self.sql_transfer(question_type, entity_dict.get('disease'))
-
- if sql:
- sql_['sql'] = sql
-
- sqls.append(sql_)
-
- return sqls
-
- '''针对不同的问题,翻译成Neo4j的SQL语句'''
- def sql_transfer(self, question_type, entities):
- if not entities:
- return []
-
- # 查询语句
- sql = []
- # 查询疾病的原因
- if question_type == 'disease_cause':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cause".format(i) for i in entities]
-
- # 查询疾病的防御措施
- elif question_type == 'disease_prevent':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.prevent".format(i) for i in entities]
-
- # 查询疾病的持续时间
- elif question_type == 'disease_lasttime':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_lasttime".format(i) for i in entities]
-
- # 查询疾病的治愈概率
- elif question_type == 'disease_cureprob':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cured_prob".format(i) for i in entities]
-
- # 查询疾病的治疗方式
- elif question_type == 'disease_cureway':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.cure_way".format(i) for i in entities]
-
- # 查询疾病的易发人群
- elif question_type == 'disease_easyget':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.easy_get".format(i) for i in entities]
-
- # 查询疾病的相关介绍
- elif question_type == 'disease_desc':
- sql = ["MATCH (m:Disease) where m.name = '{0}' return m.name, m.desc".format(i) for i in entities]
-
- # 查询疾病有哪些症状
- elif question_type == 'disease_symptom':
- sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- # 查询症状会导致哪些疾病
- elif question_type == 'symptom_disease':
- sql = ["MATCH (m:Disease)-[r:has_symptom]->(n:Symptom) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- # 查询疾病的并发症
- elif question_type == 'disease_acompany':
- sql1 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql2 = ["MATCH (m:Disease)-[r:acompany_with]->(n:Disease) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql = sql1 + sql2
- # 查询疾病的忌口
- elif question_type == 'disease_not_food':
- sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- # 查询疾病建议吃的东西
- elif question_type == 'disease_do_food':
- sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql = sql1 + sql2
-
- # 已知忌口查疾病
- elif question_type == 'food_not_disease':
- sql = ["MATCH (m:Disease)-[r:no_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- # 已知推荐查疾病
- elif question_type == 'food_do_disease':
- sql1 = ["MATCH (m:Disease)-[r:do_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql2 = ["MATCH (m:Disease)-[r:recommand_eat]->(n:Food) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql = sql1 + sql2
-
- # 查询疾病常用药品-药品别名记得扩充
- elif question_type == 'disease_drug':
- sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql = sql1 + sql2
-
- # 已知药品查询能够治疗的疾病
- elif question_type == 'drug_disease':
- sql1 = ["MATCH (m:Disease)-[r:common_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql2 = ["MATCH (m:Disease)-[r:recommand_drug]->(n:Drug) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
- sql = sql1 + sql2
- # 查询疾病应该进行的检查
- elif question_type == 'disease_check':
- sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where m.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- # 已知检查查询疾病
- elif question_type == 'check_disease':
- sql = ["MATCH (m:Disease)-[r:need_check]->(n:Check) where n.name = '{0}' return m.name, r.name, n.name".format(i) for i in entities]
-
- return sql
-
-
- #用cypher语句搜索问题类型,将找到的信息以python模式添加到答案里。
- if __name__ == '__main__':
- handler = QuestionPaser()
解析后的结果查询
- """
- 问句解析之后需要对解析后的结果进行查询。
- 该脚本创建了一个AnswerSearcher类。与build_medicalgraph.py类似,
- 该类定义了Graph类的成员变量g和返回答案列举的最大个数num_list。
- 该类的成员函数有两个,一个查询主函数一个回复模块。
- search_main函数
- 传入问题解析的结果sqls,将保存在queries里的[‘question_type’]和[‘sql’]分别取出。
- 首先调用self.g.run(query).data()函数执行[‘sql’]中的查询语句得到查询结果,
- 再根据[‘question_type’]的不同调用answer_prettify函数将查询结果和答案话术结合起来。
- 最后返回最终的答案。
- answer_prettify函数
- 该函数根据对应的qustion_type,调用相应的回复模板。
- 原文链接:https://blog.csdn.net/vivian_ll/article/details/89840281
- """
- """
- 执行neo4j查询语句并拼接成自然语言
- """
- from py2neo import Graph
-
- class AnswerSearcher:
- #链接数据库
- def __init__(self):
- self.g = Graph(
- host="127.0.0.1",
- http_port=7474,
- user="neo4j",
- password="101827bdx")
- self.num_limit = 20
-
- '''执行cypher查询,并返回相应结果'''
- def search_main(self, sqls):
- final_answers = []
- for sql_ in sqls:
- question_type = sql_['question_type']
- queries = sql_['sql']
- answers = []
- for query in queries:
- #执行sql语句
- ress = self.g.run(query).data()
- answers += ress
- #传过去当前问题和当前问题的所有回答
- final_answer = self.answer_prettify(question_type, answers)
- if final_answer:
- final_answers.append(final_answer)
- return final_answers
-
- '''根据对应的qustion_type,调用相应的回复模板'''
- def answer_prettify(self, question_type, answers):
- final_answer = []
- if not answers:
- return ''
- if question_type == 'disease_symptom':
- # 根据上文,m代表疾病,n代表查询另一端结点,此处是症状
- desc = [i['n.name'] for i in answers]
- # {0}{1}代表format函数中变量的位置
- # set方法是对元素进行去重,处理之后是一个字典形式,使用list是将其转化为列表
- # 将症状去重化为列表,将列表中所有项通过分号连接成完整的部分
- subject = answers[0]['m.name']
- final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'symptom_disease':
- desc = [i['m.name'] for i in answers]
- subject = answers[0]['n.name']
- final_answer = '症状{0}可能染上的疾病有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_cause':
- desc = [i['m.cause'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}可能的成因有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_prevent':
- desc = [i['m.prevent'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}的预防措施包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_lasttime':
- desc = [i['m.cure_lasttime'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}治疗可能持续的周期为:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_cureway':
- desc = [';'.join(i['m.cure_way']) for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}可以尝试如下治疗:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_cureprob':
- desc = [i['m.cured_prob'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}治愈的概率为(仅供参考):{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_easyget':
- desc = [i['m.easy_get'] for i in answers]
- subject = answers[0]['m.name']
-
- final_answer = '{0}的易感人群包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_desc':
- desc = [i['m.desc'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0},熟悉一下:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_acompany':
- desc1 = [i['n.name'] for i in answers]
- desc2 = [i['m.name'] for i in answers]
- subject = answers[0]['m.name']
- desc = [i for i in desc1 + desc2 if i != subject]
- final_answer = '{0}的症状包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_not_food':
- desc = [i['n.name'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}忌食的食物包括有:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_do_food':
- do_desc = [i['n.name'] for i in answers if i['r.name'] == '宜吃']
- recommand_desc = [i['n.name'] for i in answers if i['r.name'] == '推荐食谱']
- subject = answers[0]['m.name']
- final_answer = '{0}宜食的食物包括有:{1}\n推荐食谱包括有:{2}'.format(subject, ';'.join(list(set(do_desc))[:self.num_limit]), ';'.join(list(set(recommand_desc))[:self.num_limit]))
-
- elif question_type == 'food_not_disease':
- desc = [i['m.name'] for i in answers]
- subject = answers[0]['n.name']
- final_answer = '患有{0}的人最好不要吃{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
-
- elif question_type == 'food_do_disease':
- desc = [i['m.name'] for i in answers]
- subject = answers[0]['n.name']
- final_answer = '患有{0}的人建议多试试{1}'.format(';'.join(list(set(desc))[:self.num_limit]), subject)
-
- elif question_type == 'disease_drug':
- desc = [i['n.name'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}通常的使用的药品包括:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'drug_disease':
- desc = [i['m.name'] for i in answers]
- subject = answers[0]['n.name']
- final_answer = '{0}主治的疾病有{1},可以试试'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'disease_check':
- desc = [i['n.name'] for i in answers]
- subject = answers[0]['m.name']
- final_answer = '{0}通常可以通过以下方式检查出来:{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- elif question_type == 'check_disease':
- desc = [i['m.name'] for i in answers]
- subject = answers[0]['n.name']
- final_answer = '通常可以通过{0}检查出来的疾病有{1}'.format(subject, ';'.join(list(set(desc))[:self.num_limit]))
-
- return final_answer
-
-
- if __name__ == '__main__':
- searcher = AnswerSearcher()
(3)本项目的问答系统完全基于规则匹配实现,通过关键词匹配,对问句进行分类, #医疗问题本身属于封闭域类场景,对领域问题进行穷举并分类, 然后使用cypher的match去匹配查找neo4j,根据返回数据组装问句回答,最后返回结果。 问答框架的构建是通过chatbot_graph.py、answer_search.py、 # question_classifier.py、question_parser.py等脚本实现。
资料链接直通:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。