如何使用XSSF和SAX(Event API)为大型Excel文件获取命名范围,图纸名称和参考表单

我正在阅读大型Excel文件(大小〜10MB,.xlsx)。

我正在使用下面的代码

Workbook xmlworkbook =WorkbookFactory.create(OPCPackage.openOrCreate(root_path_name_file)); 

但是它显示了堆内存问题。

我也看到了一些StackOverflow上的其他解决scheme,以增加JVM,但我不想增加jvm。

问题1)我们不能使用SXSSF (Streaming Usermodel API)因为这只用于编写或创build新的工作簿。

我唯一的目标是获得大的excel文件的名单范围的数量,总张数和他们的名单。

如果需求只是获取命名的范围和表名,那么只有*.xlsx ZIPPackage必须被parsing,因为这些信息都存储在那里。

这可以通过获取适当的PackagePart并从中parsingXML来实现。 为了parsingXML我最喜欢使用StAX

获取所有图纸名称和定义的命名范围的示例代码:

 import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.openxml4j.opc.PackagePart; import javax.xml.stream.XMLEventReader; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.events.StartElement; import javax.xml.stream.events.EndElement; import javax.xml.stream.events.Characters; import javax.xml.stream.events.Attribute; import javax.xml.stream.events.XMLEvent; import javax.xml.namespace.QName; import java.io.File; import java.util.regex.Pattern; import java.util.List; import java.util.ArrayList; import java.util.Map; import java.util.HashMap; class StaxReadOPCPackageParts { public static void main(String[] args) { try { File file = new File("file.xlsx"); OPCPackage opcpackage = OPCPackage.open(file); //get the workbook package part PackagePart workbookpart = opcpackage.getPartsByName(Pattern.compile("/xl/workbook.xml")).get(0); //create reader for package part XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(workbookpart.getInputStream()); List<String> sheetNames = new ArrayList<>(); Map<String, String> definedNames = new HashMap<>(); boolean isInDefinedName = false; String sheetName = ""; String definedNameName = ""; StringBuffer definedNameFormula = new StringBuffer(); while(reader.hasNext()){ //loop over all XML in workbook.xml XMLEvent event = (XMLEvent)reader.next(); if(event.isStartElement()) { StartElement startElement = (StartElement)event; QName startElementName = startElement.getName(); if(startElementName.getLocalPart().equalsIgnoreCase("sheet")) { //start element of sheet definition Attribute attribute = startElement.getAttributeByName(new QName("name")); sheetName = attribute.getValue(); sheetNames.add(sheetName); } else if (startElementName.getLocalPart().equalsIgnoreCase("definedName")) { //start element of definedName Attribute attribute = startElement.getAttributeByName(new QName("name")); definedNameName = attribute.getValue(); isInDefinedName = true; } } else if(event.isCharacters() && isInDefinedName) { //character content of definedName == the formula definedNameFormula.append(((Characters)event).getData()); } else if(event.isEndElement()) { EndElement endElement = (EndElement)event; QName endElementName = endElement.getName(); if(endElementName.getLocalPart().equalsIgnoreCase("definedName")) { //end element of definedName definedNames.put(definedNameName, definedNameFormula.toString()); definedNameFormula = new StringBuffer(); isInDefinedName = false; } } } opcpackage.close(); System.out.println("Sheet names:"); for (String shName : sheetNames) { System.out.println("Sheet name: " + shName); } System.out.println("Named ranges:"); for (String defName : definedNames.keySet()) { System.out.println("Name: " + defName + ", Formula: " + definedNames.get(defName)); } } catch (Exception ex) { ex.printStackTrace(); } } }