Python：使用Openpyxl读取大型Excel工作表

我有一个包含大约400个工作表的Excel文件，我需要将其中的375个保存为CSV文件。我已经尝试了一个VBA解决scheme，但Excel只是打开这个工作簿的问题。

我已经创build了一个Python脚本来做到这一点。但是，它会迅速消耗所有可用的内存，并且在导出25张纸后几乎停止工作。有人对我如何改进这个代码有什么build议吗？

import openpyxl import csv import time print(time.ctime()) importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False) tabnames = importedfile.get_sheet_names() substring = "Keyword" for num in tabnames: if num.find(substring) > -1: sheet=importedfile.get_sheet_by_name(num) name = "C:/Users/User/Desktop/Test/" + num + ".csv" with open(name, 'w', newline='') as file: savefile = csv.writer(file) for i in sheet.rows: savefile.writerow([cell.value for cell in i]) file.close() print(time.ctime())

任何帮助，将不胜感激。

谢谢

编辑：我正在使用Windows 7和Python 3.4.3。我也对R，VBA或SPSS中的解决scheme开放。

尝试使用load_workbook()类的read_only=True属性，这会使您得到的工作表成为IterableWroksheet ，这意味着您只能遍历它们，不能直接使用列/行号来访问其中的单元格值。根据文档，这将提供near constant memory consumption 。

另外，你不需要closuresfile ，语句会为你处理。

示例 –

 import openpyxl import csv import time print(time.ctime()) importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False) tabnames = importedfile.get_sheet_names() substring = "Keyword" for num in tabnames: if num.find(substring) > -1: sheet=importedfile.get_sheet_by_name(num) name = "C:/Users/User/Desktop/Test/" + num + ".csv" with open(name, 'w', newline='') as file: savefile = csv.writer(file) for i in sheet.rows: savefile.writerow([cell.value for cell in i]) print(time.ctime())

从文档 –

有时候，你需要打开或写入非常大的XLSX文件，而openpyxl中的常用例程将无法处理该负载。幸运的是，有两种模式使您可以读取和写入无限量的数据（接近）不变的内存消耗。

Python：使用Openpyxl读取大型Excel工作表

Java中的商店编号为string在Excel中

CSVparsing，换行符/换行符问题

以特定模式读取csv文件并存储在地图或2D数组中

如何parsing包含数据中的换行符的Excel CSV数据？

“发布”表单数据到XLS / CSV

date格式的问题，当我刮到Excel中的Web？

从excel表单写入到txt文档时附加数据

Excel到特殊字符的CSV？

使用Excel VBA将Powerpivot连接更改为csv文件

将Excel分割成独立的CSV文件 – VBAmacros