用Python加载excel文件块，而不是将完整的文件加载到内存中

我想从BIG excel文件（xlsx）中只读取10行，而不是一次加载整个文件，因为它不能在我的一台机器（低内存）上完成。我试过使用

import xlrd import pandas as pd def open_file(path): xl = pd.ExcelFile(path) reader = xl.parse(chunksize=1000) for chunk in reader: print(chunk)

但它似乎像文件首先加载然后分成部分，这很糟糕。我会很感激任何关于如何只读第一行的build议。如果您需要更多信息，请留下评论，但我想所有的事情都应该清楚。谢谢！

由于xlsx文件（本质上是一堆xml文件压缩在一起）的性质，你不能在任意字节戳文件，并希望它是在你感兴趣的表中的第N行的开始在。

最好的办法是使用pandas.read_excel和skiprows （跳过文件顶部的行）和skip_footer （跳过底部的行）参数。然而，这将首先将整个文件加载到内存，然后只parsing所需的行。

 # if the file contains 300 rows, this will read the middle 100 df = pd.read_excel('/path/excel.xlsx', skiprows=100, skip_footer=100, names=['col_a', 'col_b'])

请注意，您必须使用names参数手动设置标题，否则列名称将是最后一个跳过的行。

如果你想使用csv那么这是一个简单的任务，因为csv文件是纯文本文件。

但是，这是一个很大的，但如果你真的绝望，你可以从xlsx归档中提取相关工作表的xml文件并parsing它。这不是一件容易的事情。

一个示例xml文件，它表示具有单个2 X 3表的工作表。 <v>标签代表单元格的值。

 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x14ac" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac"> <dimension ref="A1:B3"/> <sheetViews> <sheetView tabSelected="1" workbookViewId="0"> <selection activeCell="C10" sqref="C10"/> </sheetView> </sheetViews> <sheetFormatPr defaultColWidth="11" defaultRowHeight="14.25" x14ac:dyDescent="0.2"/> <sheetData> <row r="1" spans="1:2" ht="15.75" x14ac:dyDescent="0.2"> <cr="A1" t="s"> <v>1</v> </c><cr="B1" s="1" t="s"> <v>0</v> </c> </row> <row r="2" spans="1:2" ht="15" x14ac:dyDescent="0.2"> <cr="A2" s="2"> <v>1</v> </c><cr="B2" s="2"> <v>4</v> </c> </row> <row r="3" spans="1:2" ht="15" x14ac:dyDescent="0.2"> <cr="A3" s="2"> <v>2</v> </c><cr="B3" s="2"> <v>5</v> </c> </row> </sheetData> <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/> </worksheet>

用Python加载excel文件块，而不是将完整的文件加载到内存中

在匹配列下合并可变数量的文件

如何从C＃中的Excel文件中读取数据的types

如何提高缓慢读取文件的VBAmacros

检查Excel是否打开（从另一个Office 2010应用程序）

什么.NET库可以用来生成没有安装Excel的Excel（XLS）文件？

Excel单元格格式特殊情况

Python：使用Excel CSV文件只读取特定的列和行

C ++中如何在Excel中将列转换为C ++中的vector？

使用vba将文件从一个文件夹复制到另一个文件夹

从ms访问应用程序导出整个vba代码