当前位置:   article > 正文

Python处理常用结构化文本文件

python文本结构化

《Python语言及其应用》的学习笔记

1. CSV

写入CSV文件

  1. import csv
  2. alphabet = [
  3. ('Char', 'No'),
  4. ('a', 1),
  5. ('b', 2),
  6. ('c', 3),
  7. ]
  8. # 如果写入的文件出现多个空行,则在打开文件时,设置newline为空(newline='')
  9. with open('alphabet.csv', 'wt', encoding='utf-8', newline='') as fout:
  10. csvout = csv.writer(fout)
  11. csvout.writerows(alphabet)
  12. 复制代码

读取CSV文件

  1. with open('alphabet.csv', 'rt', encoding='utf-8') as fin:
  2. cin = csv.reader(fin)
  3. alphabet = [row for row in cin]
  4. print(alphabet) # [['Char', 'No'], ['a', '1'], ['b', '2'], ['c', '3']]
  5. 复制代码

使用DictWriterDictReader

  1. import csv
  2. alphabet = [
  3. {'Char': 'a', 'No': 1},
  4. {'Char': 'b', 'No': 2},
  5. {'Char': 'c', 'No': 3}
  6. ]
  7. with open('alphabet.csv', 'wt', encoding='utf-8', newline='') as fout:
  8. csvout = csv.DictWriter(fout, ['Char', 'No'])
  9. csvout.writeheader()
  10. csvout.writerows(alphabet)
  11. with open('alphabet.csv', 'rt', encoding='utf-8') as fin:
  12. cin = csv.DictReader(fin)
  13. alphabet = [dict(row) for row in cin]
  14. print(alphabet) # [{'Char': 'a', 'No': '1'}, {'Char': 'b', 'No': '2'}, {'Char': 'c', 'No': '3'}]
  15. 复制代码

2. XML

通常用于数据传送和消息, 如RSS和Atom

  1. <?xml version="1.0" encoding="utf-8" ?>
  2. <menu>
  3. <breakfast hours="7-11">
  4. <item price="$6.00">breakfast burritos</item>
  5. <item price="$4.00">pancakes</item>
  6. </breakfast>
  7. <lunch hours="11-3">
  8. <item price="$5.00">hamburger</item>
  9. </lunch>
  10. <dinner hours="3-10">
  11. <item price="8.00">spaghetti</item>
  12. </dinner>
  13. </menu>
  14. 复制代码

使用ElementTree解析xml文件

  1. import xml.etree.ElementTree as et
  2. tree = et.ElementTree(file='menu.xml')
  3. root = tree.getroot()
  4. print(root.tag) # menu
  5. for child in root:
  6. print(child.tag, child.attrib)
  7. for grandchild in child:
  8. print('\t', grandchild.tag, grandchild.attrib)
  9. # breakfast {'hours': '7-11'}
  10. # item {'price': '$6.00'}
  11. # item {'price': '$4.00'}
  12. # lunch {'hours': '11-3'}
  13. # item {'price': '$5.00'}
  14. # dinner {'hours': '3-10'}
  15. # item {'price': '8.00'}
  16. print(len(root)) # 3
  17. print(len(root[0])) # 2
  18. 复制代码

3. JSON

  • 使用json.dumps()编码成JSON字符串
  • 使用json.loads()JSON字符串解析成Python数据结构
  1. import json
  2. from datetime import datetime
  3. now = datetime.utcnow()
  4. json.dumps(now)
  5. # 报错TypeError: Object of type 'datetime' is not JSON serializable
  6. # 需要将datetime转换成JSON能够理解的类型,如str或epoch
  7. now_str = str(now)
  8. json.dumps(now_str)
  9. 复制代码

通过继承和修改JSON的编码方式,使其支持datetime

  1. import json
  2. from datetime import datetime
  3. from time import mktime
  4. now = datetime.utcnow()
  5. class DTEncoder(json.JSONEncoder):
  6. def default(self, o):
  7. if isinstance(o, datetime):
  8. return int(mktime(o.timetuple()))
  9. return json.JSONEncoder.default(self, o)
  10. json.dumps(now, cls=DTEncoder)
  11. 复制代码

4. 配置文件

settings.cfg配置文件

  1. [english]
  2. greeting = Hello
  3. [french]
  4. greeting = Bonjour
  5. [files]
  6. home = /usr/local
  7. # 简单的插入
  8. bin = %(home)s/bin
  9. 复制代码

使用configparser读取配置文件

  1. import configparser
  2. cfg = configparser.ConfigParser()
  3. cfg.read('settings.cfg', encoding='utf-8')
  4. print(cfg['french']['greeting']) # Bonjour
  5. print(cfg['files']['bin']) # /usr/local/bin
  6. 复制代码

5. 使用pickle序列化

pickle是以特殊的二进制格式保存和恢复数据的

  1. import pickle
  2. from datetime import datetime
  3. now = datetime.utcnow()
  4. now_pickled = pickle.dumps(now) # b'\x80\x03cdatetime\ndatetime\nq\x00C\n\x07\xe2\x03\x10\x034)\x00\xca_q\x01\x85q\x02Rq\x03.'
  5. now2 = pickle.loads(now_pickled) # 2018-03-16 03:52:41.051807
  6. 复制代码

转载于:https://juejin.im/post/5aab41d9518825556d0ddb02

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/不正经/article/detail/376310
推荐阅读
相关标签
  

闽ICP备14008679号