赞
踩
目录
- xml = '''<?xml version='1.0' encoding='utf-8'?>
- <data>
- <row>
- <shape>square</shape>
- <degrees>360</degrees>
- <sides>4.0</sides>
- </row>
- <row>
- <shape>circle</shape>
- <degrees>360</degrees>
- </row>
- <row>
- <shape>triangle</shape>
- <degrees>180</degrees>
- <sides>3.0</sides>
- </row>
- </data>'''
- from xml.etree import cElementTree as ET
- import pandas as pd
-
- # 读取xml字符串
- root = ET.fromstring(text=xml)
- # 读取xml文件
- # tree = ET.ElementTree(file="text.xml")
- # root = tree.getroot()
- data = list()
- for child in root:
- data1 = list()
- for son in child:
- data1.append(son.text)
- data.append(data1)
-
- df = pd.DataFrame(data, columns=['shape', 'degrees', 'sides'])
- print(df)
输出结果: shape degrees sides 0 square 360 4.0 1 circle 360 NaN 2 triangle 180 3.0
如果 shape 、degrees、sides 不是按照一定规律排列,这样取数据容易出错。
比如将最后一组 degrees、 shape 、sides ,
输出结果便会变成:
shape degrees sides 0 square 360 4.0 1 circle 360 None 2 180 triangle 3.0
- import pandas as pd
- df = pd.read_xml(xml)
- print(df)
输出结果: shape degrees sides 0 square 360 4.0 1 circle 360 NaN 2 triangle 180 3.0
将xml转为类似json的格式
利用pd.json_normalize() 读到dataframe
- def fun1(root):
- dic1 = dict()
- for child in root:
- if bool(child) is True: # 有下一层
- print(child.tag)
- dic2 = fun1(child) # 自己调用自己
- value = dic1.get(child.tag) # 存在返回,不存在返回None
- if value: # 存在
- value.append(dic2)
- dic1[child.tag] = value
- else:
- dic1[child.tag] = [dic2]
- else:
- dic1[child.tag] = child.text
- return dic1
-
- if __name__ == '__main__':
- from xml.etree import cElementTree as ET
- import pandas as pd
-
- root = ET.fromstring(text=xml)
-
- dic1 = fun1(root)
- df = pd.json_normalize(dic1['row'])
- print(df)
输出结果: shape degrees sides 0 square 360 4.0 1 circle 360 NaN 2 triangle 180 3.0
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。