当前位置:   article > 正文

去除XML标注文件中的多余标签类别_python去除文本中的xml标签

python去除文本中的xml标签

需求:

        XML标注文件中有多个标签类别,如果只需要其中几种标签类别,就需要去除多余的其他标签类别。

解决方案:

        使用python中的xml.etree.ElementTree库批量去除标注文件所在文件夹所有XML标注文件中的多余标签类别。

具体代码如下:

  1. import os
  2. import glob
  3. import xml.etree.ElementTree as et
  4. def delete_bbox(dir_path,labels):
  5. file_list = glob.glob(os.path.join(dir_path, "*.xml"))
  6. index = 0
  7. for file in file_list:
  8. print(file,index)
  9. index += 1
  10. tree_ = et.parse(file)
  11. root_ = tree_.getroot()
  12. root = et.Element("annotation")
  13. folder = et.SubElement(root, "folder")
  14. folder.text = "images"
  15. filename = et.SubElement(root, "filename")
  16. filename.text = root_.find(".//filename").text
  17. source = et.SubElement(root, "source")
  18. database = et.SubElement(source, "database")
  19. database.text = "Unknown"
  20. size = et.SubElement(root, "size")
  21. width = et.SubElement(size, "width")
  22. width.text = root_.find(".//width").text
  23. height = et.SubElement(size, "height")
  24. height.text = root_.find(".//height").text
  25. depth = et.SubElement(size, "depth")
  26. depth.text = "3"
  27. segmented = et.SubElement(root, "segmented")
  28. segmented.text = "0"
  29. for object in root_.iter("object"):
  30. name_ = object.find("name").text
  31. if name_ in labels:
  32. object_ = et.SubElement(root, "object")
  33. name = et.SubElement(object_,"name")
  34. name.text = name_
  35. pose = et.SubElement(object_,"pose")
  36. pose.text = "Unspecified"
  37. truncated = et.SubElement(object_,"truncated")
  38. truncated.text = "0"
  39. difficult = et.SubElement(object_,"difficult")
  40. difficult.text = "0"
  41. bndbox = et.SubElement(object_,"bndbox")
  42. xmin = et.SubElement(bndbox,"xmin")
  43. xmin.text = object.find(".//xmin").text
  44. ymin = et.SubElement(bndbox,"ymin")
  45. ymin.text = object.find(".//ymin").text
  46. xmax = et.SubElement(bndbox,"xmax")
  47. xmax.text = object.find(".//xmax").text
  48. ymax = et.SubElement(bndbox,"ymax")
  49. ymax.text = object.find(".//ymax").text
  50. pretty_xml(root, ' ', '\n')
  51. tree = et.ElementTree(root)
  52. tree.write(file, encoding="utf-8")
  53. def pretty_xml(element, indent, newline, level=0): # elemnt为传进来的Elment类,参数indent用于缩进,newline用于换行
  54. if element: # 判断element是否有子元素
  55. if (element.text is None) or element.text.isspace(): # 如果element的text没有内容
  56. element.text = newline + indent * (level + 1)
  57. else:
  58. element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * (level + 1)
  59. # else: # 此处两行如果把注释去掉,Element的text也会另起一行
  60. # element.text = newline + indent * (level + 1) + element.text.strip() + newline + indent * level
  61. temp = list(element) # 将element转成list
  62. for subelement in temp:
  63. if temp.index(subelement) < (len(temp) - 1): # 如果不是list的最后一个元素,说明下一个行是同级别元素的起始,缩进应一致
  64. subelement.tail = newline + indent * (level + 1)
  65. else: # 如果是list的最后一个元素, 说明下一行是母元素的结束,缩进应该少一个
  66. subelement.tail = newline + indent * level
  67. pretty_xml(subelement, indent, newline, level=level + 1) # 对子元素进行递归操作
  68. if __name__ == '__main__':
  69. dir_path = "./Annotations" # xml标注文件所在文件夹
  70. labels = ["red","green","blue"] # 需要保留的标签类别
  71. delete_bbox(dir_path,labels)

以上脚本,需要修改xml标注文件所在文件夹路径(dir_path)和需要保留的标签类别(labels),亲测可用。

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/Cpp五条/article/detail/71404
推荐阅读
相关标签
  

闽ICP备14008679号