当前位置:   article > 正文

利用python读取xml中的数据

python读取xml

目录

xml例子

方法一:利用cElementTree

方法二:利用read_xml()

方法三:利用pd.json_normalize()


xml例子

  1. xml = '''<?xml version='1.0' encoding='utf-8'?>
  2. <data>
  3. <row>
  4. <shape>square</shape>
  5. <degrees>360</degrees>
  6. <sides>4.0</sides>
  7. </row>
  8. <row>
  9. <shape>circle</shape>
  10. <degrees>360</degrees>
  11. </row>
  12. <row>
  13. <shape>triangle</shape>
  14. <degrees>180</degrees>
  15. <sides>3.0</sides>
  16. </row>
  17. </data>'''

方法一:利用cElementTree

  1. from xml.etree import cElementTree as ET
  2. import pandas as pd
  3. # 读取xml字符串
  4. root = ET.fromstring(text=xml)
  5. # 读取xml文件
  6. # tree = ET.ElementTree(file="text.xml")  
  7. # root = tree.getroot()
  8. data = list()
  9. for child in root:
  10.    data1 = list()
  11.    for son in child:
  12.        data1.append(son.text)
  13.    data.append(data1)
  14. df = pd.DataFrame(data, columns=['shape', 'degrees', 'sides'])
  15. print(df)
输出结果:
    shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

如果 shape 、degrees、sides 不是按照一定规律排列,这样取数据容易出错。

比如将最后一组 degrees、 shape 、sides ,

输出结果便会变成:

    shape   degrees sides
0  square       360   4.0
1  circle       360  None
2     180  triangle   3.0

方法二:利用read_xml()

  1. import pandas as pd
  2. df = pd.read_xml(xml)
  3. print(df)
输出结果:
    shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

方法三:利用pd.json_normalize()

  • 将xml转为类似json的格式

  • 利用pd.json_normalize() 读到dataframe

  1. def fun1(root):
  2. dic1 = dict()
  3. for child in root:
  4. if bool(child) is True: # 有下一层
  5. print(child.tag)
  6. dic2 = fun1(child) # 自己调用自己
  7. value = dic1.get(child.tag) # 存在返回,不存在返回None
  8. if value: # 存在
  9. value.append(dic2)
  10. dic1[child.tag] = value
  11. else:
  12. dic1[child.tag] = [dic2]
  13. else:
  14. dic1[child.tag] = child.text
  15. return dic1
  16. if __name__ == '__main__':
  17. from xml.etree import cElementTree as ET
  18. import pandas as pd
  19. root = ET.fromstring(text=xml)
  20. dic1 = fun1(root)
  21. df = pd.json_normalize(dic1['row'])
  22. print(df)
输出结果:
    shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0
本文内容由网友自发贡献,转载请注明出处:【wpsshop博客】
推荐阅读
相关标签
  

闽ICP备14008679号