使用SAX Approch Open XML获取行中的所有单元格

我只是用Open XML SDK的DOM方法阅读一个大的xlsx文件。 它工作正常; 然而,这需要永远。 所以我想用SAX方法做同样的事情。 但是,我没有得到这个。 我在DOM方法中做的是为工作簿中的每个工作表获取工作表的名称。 然后我假定第一行有所有的列名。 接下来,我创build一个具有所有第一行中列出的属性的类。 之后,我读了其余的行。 对于每一行,我创build一个新的对象与我dynamic创build的自定义类。 然后,我遍历行中的每个单元格,以使用我得到的值填充对象。

这里是我用来完成我刚刚使用DOM方法描述的任务的代码。

public static List<Object> ConvertExcelArchiveToListObjects(string filePath) { ... using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false)) { WorkbookPart wbPart = spreadsheetDocument.WorkbookPart; Sheets theSheets = wbPart.Workbook.Sheets; SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); ... var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList(); foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts) { Worksheet worksheet = worksheetpart.Worksheet; string partRelationshipId = wbPart.GetIdOfPart(worksheetpart); var correspondingSheet = sheets.FirstOrDefault( s => s.Id.HasValue && s.Id.Value == partRelationshipId); Debug.Assert(correspondingSheet != null); // Grab the sheet name string sheetName = correspondingSheet.GetAttribute("name", "").Value; ... dynamic expandoObjectClass = new ExpandoObject(); List<Object> listObjectsCustomClasses = new List<Object>(); foreach (var dataRow in rowContent) { Type generatedType = typeBuilder.CreateType(); object generatedObject = Activator.CreateInstance(generatedType); PropertyInfo[] properties = generatedType.GetProperties(); int propertiesCounter = 0; // Loop over the values that we will assign to the properties var rowCells = dataRow.Descendants<Cell>(); var value = string.Empty; foreach (var rowCell in rowCells) { if (rowCell.DataType != null && rowCell.DataType.HasValue && rowCell.DataType == CellValues.SharedString && int.Parse(rowCell.CellValue.InnerText) < ssTable.ChildElements.Count) { value = ssTable.ChildElements[int.Parse(rowCell.CellValue.InnerText)].InnerText ?? string.Empty; } else { if (rowCell.CellValue != null && rowCell.CellValue.InnerText != null) { value = rowCell.CellValue.InnerText; } else { value = string.Empty; } } properties[propertiesCounter].SetValue(generatedObject, value, null); propertiesCounter++; } listObjectsCustomClasses.Add(generatedObject); } listObjects.Add(listObjectsCustomClasses); } } DateTime end = DateTime.UtcNow; Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes."); return listObjects; } 

但是,每当我读取大的xlsx文件(大于30 MB的大小),上述方法需要花费大量的时间来执行。 我已经写了这段代码,至less得到行,而不深入挖掘每一行中的单元格。

 public static List<Object> ConvertExcelArchiveToListObjectsSAXApproach(string filePath) { DateTime begin = DateTime.UtcNow; List<Object> listObjects = new List<Object>(); using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false)) { WorkbookPart wbPart = spreadsheetDocument.WorkbookPart; Sheets theSheets = wbPart.Workbook.Sheets; SharedStringTablePart sstPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); SharedStringTable ssTable = null; if (sstPart != null) ssTable = sstPart.SharedStringTable; // Get the CellFormats for cells without defined data types WorkbookStylesPart workbookStylesPart = spreadsheetDocument.WorkbookPart.GetPartsOfType<WorkbookStylesPart>().First(); CellFormats cellFormats = (CellFormats)workbookStylesPart.Stylesheet.CellFormats; var sheets = wbPart.Workbook.Sheets.Cast<Sheet>().ToList(); foreach (WorksheetPart worksheetpart in wbPart.WorksheetParts) { //Worksheet worksheet = worksheetpart.Worksheet; OpenXmlPartReader reader = new OpenXmlPartReader(worksheetpart); bool firstRow = false; while (reader.Read()) { if (reader.ElementType == typeof(Row)) { ... } if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet reader.Skip(); // Skip contents of any node before finding the first row. } DateTime end = DateTime.UtcNow; Console.WriteLine("Measured time: " + (end - begin).TotalMinutes + " minutes."); return listObjects; } 

但是,我设置的断点

 if (reader.ElementType == typeof(Row)) { ... } 

甚至没有被击中。 任何想法,我在想什么? 谢谢!

你看到线程中的代码使用OpenXmlReader 。 代码正在做你正在做的事情。