Apache POI SAXparsing – 如何获取单元格的实际值

我有一个需要parsing非常大的Excel文件使用Apache poi与限制内存。 谷歌search后,我才知道,poi提供SAXparsing器有效地parsing大文件,而不消耗大量的内存。

Apache POI SAXparsing器示例

private class SheetToCSV implements SheetContentsHandler { private boolean firstCellOfRow = false; private int currentRow = -1; private int currentCol = -1; private void outputMissingRows(int number) { for (int i=0; i<number; i++) { for (int j=0; j<minColumns; j++) { output.append(','); } output.append('\n'); } } @Override public void startRow(int rowNum) { // If there were gaps, output the missing rows outputMissingRows(rowNum-currentRow-1); // Prepare for this row firstCellOfRow = true; currentRow = rowNum; currentCol = -1; } @Override public void endRow(int rowNum) { // Ensure the minimum number of columns for (int i=currentCol; i<minColumns; i++) { output.append(','); } output.append('\n'); } @Override public void cell(String cellReference, String formattedValue, XSSFComment comment) { if (firstCellOfRow) { firstCellOfRow = false; } else { output.append(','); } // gracefully handle missing CellRef here in a similar way as XSSFCell does if(cellReference == null) { cellReference = new CellAddress(currentRow, currentCol).formatAsString(); } // Did we miss any cells? int thisCol = (new CellReference(cellReference)).getCol(); int missedCols = thisCol - currentCol - 1; for (int i=0; i<missedCols; i++) { output.append(','); } currentCol = thisCol; // Number or string? try { Double.parseDouble(formattedValue); output.append(formattedValue); } catch (NumberFormatException e) { output.append('"'); output.append(formattedValue); output.append('"'); } } @Override public void headerFooter(String text, boolean isHeader, String tagName) { // Skip, no headers or footers in CSV } } 

在上面的链接提供的示例中,方法“cell”只能访问格式化的值,但是我需要访问单元格的实际值。

stream接口的当前实现不提供这个。 所以为了达到这个目的,你需要复制底层的XSSFSheetXMLHandler的代码并且调整它,以便单元格内容不被格式化。