赞
踩
我有一个Excel电子表格,我需要每天导入到SQL Server.电子表格将在大约50列中包含大约250,000行.我使用openpyxl和xlrd使用几乎相同的代码测试了它们.
这是我正在使用的代码(减去调试语句):
import xlrd
import openpyxl
def UseXlrd(file_name):
workbook = xlrd.open_workbook(file_name, on_demand=True)
worksheet = workbook.sheet_by_index(0)
first_row = []
for col in range(worksheet.ncols):
first_row.append(worksheet.cell_value(0,col))
data = []
for row in range(1, worksheet.nrows):
record = {}
for col in range(worksheet.ncols):
if isinstance(worksheet.cell_value(row,col), str):
record[first_row[col]] = worksheet.cell_value(row,col).strip()
else:
record[first_row[col]] = worksheet.cell_value(row,col)
data.append(record)
return data
def UseOpenpyxl(file_name):
wb = openpyxl.load_workbook(file_name, read_only=True)
sheet = wb.active
first_row = []
for col in range(1,sheet.max_column+1):
first_row.append(sheet.cell(row=1,column=col).value)
data = []
for r in range(2,sheet.max_row+1):
record = {}
for col in range(sheet.max_column):
if isinstance(sheet.cell(row=r,column=col+1).value, str):
record[first_row[col]] = sheet.cell(row=r,column=col+1).value.strip()
else:
record[first_row[col]] = sheet.cell(row=r,column=col+1).value
data.append(record)
return data
xlrd_results = UseXlrd('foo.xls')
openpyxl_resuts = UseOpenpyxl('foo.xls')
传递包含3500行的相同Excel文件会产生截然不同的运行时间.使用xlrd我可以在2秒内将整个文件读入字典列表.使用openpyxl我得到以下结果:
Reading Excel File...
Read 100 lines in 114.14509415626526 seconds
Read 200 lines in 471.43183994293213 seconds
Read 300 lines in 982.5288782119751 seconds
Read 400 lines in 1729.3348784446716 seconds
Read 500 lines in 2774.886833190918 seconds
Read 600 lines in 4384.074863195419 seconds
Read 700 lines in 6396.7723388671875 seconds
Read 800 lines in 7998.775000572205 seconds
Read 900 lines in 11018.460735321045 seconds
虽然我可以在最终脚本中使用xlrd,但由于各种问题,我将不得不对很多格式进行硬编码(即int读取为float,date读取为int,datetime读取为float).由于我需要将这些代码重用于更多的导入,因此尝试硬编码特定列以正确格式化它们并且必须在4个不同的脚本中维护类似的代码是没有意义的.
关于如何进行的任何建议?
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。