赞
踩
一、编程环境
Win10
Python3.6
Jupyter Notebook
Graphviz (简介和安装请参考https://www.jianshu.com/p/b559dc689b7f)
二、数据源
三、清洗数据
1 将疾病和对应的多个症状放到字典里,key为疾病,value为多个症状。
注意,有些疾病和症状包含了特殊符号’^’,需要先处理成’_’再切割。import csvfrom collections import defaultdict
disease_list = []def return_list(disease):
disease_list = []
match = disease.replace('^','_').split('_')
ctr = 1
for group in match: if ctr%2==0:
disease_list.append(group)
ctr = ctr + 1
return disease_listwith open("Scraped-Data/dataset_uncleaned.csv") as csvfile:
reader = csv.reader(csvfile)
disease=""
weight = 0
disease_list = []
dict_wt = {}
dict_=defaultdict(list) for row in reader: if row[0]!="\xc2\xa0" and row[0]!="":
disease = row[0]
disease_list = return_list(disease)
weight = row[1] if row[2]!="\xc2\xa0" and row[2]!="":
symptom_list = return_list(row[2]) for d in disease_list: for s in symptom_list:
dict_[d].append(s)
dict_wt[d] = weight print (dict_)
2 将疾病-症状-样本数写到d
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。