MemoryError使用openpyxl写入500k +行

我有一个脚本,它使用openpyxl打开一个模板xlsx文件,然后跨越六个工作表中的每一个,添加脚本中早先生成的列表中的一些数据,并更改单元格的格式。

我遇到的问题是,在一张纸上,我需要写9列和500k +行,这给我一个MemoryError

 Traceback (most recent call last): File "C:\python27\labs\labs\sqrdist\new_main_ui.py", line 667, in request_and_send_reports x = sqr_pull.main() File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 801, in wrapper val = prof(func)(*args, **kwargs) File "C:\Python27\lib\site-packages\memory_profiler-0.32-py2.7.egg\memory_profiler.py", line 445, in f result = func(*args, **kwds) File "C:\python27\labs\labs\sqrdist\sqr_pull.py", line 327, in main os.remove(temp_attach_filepath) File "build\bdist.win32\egg\openpyxl\workbook\workbook.py", line 281, in save File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 214, in save_workbook File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 197, in save File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 109, in write_data File "build\bdist.win32\egg\openpyxl\writer\excel.py", line 134, in _write_worksheets File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 281, in write_worksheet File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 381, in write_worksheet_data File "build\bdist.win32\egg\openpyxl\writer\worksheet.py", line 404, in write_cell File "build\bdist.win32\egg\openpyxl\xml\functions.py", line 142, in start_tag File "C:\Python27\lib\xml\sax\saxutils.py", line 159, in startElement self._write(u' %s=%s' % (name, quoteattr(value))) File "C:\Python27\lib\xml\sax\saxutils.py", line 104, in write self.flush() MemoryError 

我认为导致这个代码如下,其中KeywordReport是列表的列表。

 ws_keywords = wb.get_sheet_by_name("E_KWs") for r, row in enumerate(KeywordReport, start=1): for c, val in enumerate(row, start=1): mycell = ws_keywords.cell(row=r, column=c) mycell.value = val mycell.style = Style(border=thin_border) ws_keywords.column_dimensions['A'].width = 60.0 ws_keywords.column_dimensions['B'].width = 50.0 ws_keywords.column_dimensions['C'].width = 50.0 ws_keywords.column_dimensions['D'].width = 15.0 ws_keywords.column_dimensions['E'].width = 16.0 ws_keywords.column_dimensions['F'].width = 16.0 ws_keywords.column_dimensions['G'].width = 16.0 for ref in ['A1','B1','C1','D1','E1','F1','G1']: cell = ws_keywords.cell(ref) cell.style = Style(font=Font(bold=True),fill=PatternFill(patternType='solid', fgColor=Color('ffd156')), border=thin_border) gc.collect() del KeywordReport[:] gc.collect() print "start of save" wb.save(attach_filepath) gc.collect() os.remove(temp_attach_filepath) QCoreApplication.processEvents() 

我已经看过http://openpyxl.readthedocs.org/en/latest/optimized.html但是我不认为我可以使用它来写入,而不是只是倾倒到一个新的工作簿,但我需要现有的数据模板。

有没有解决的办法?

500k行不应该是太多的问题。 但我想这也取决于你有多less工作表。 你在系统上有多less内存?

安装lxml会更快(就像在循环之外创build任何样式一样),但是我不希望它将内存使用降低太多。

如果您确实需要复制现有工作簿中的数据,则可能需要考虑使用单独的工作簿进行更改,这样可以减less内存的使用,包括读取和写入。 进一步的讨论可能是最好的邮件列表。