使用OpenXML sdk读取excel文件时遇到困难

我有一个从Excel文件中读取并将结果存储在DataSet中的函数。 我有另一个function写入Excel文件。 当我尝试从常规的人类生成的Excel文件读取时,Excel读取函数返回一个空白的DataSet,但是当我从由写入函数生成的Excel文件读取时,它工作得很好。 该函数然后将无法正常生成的Excel文件,即使我只是复制和粘贴函数生成的Excel文件的内容。 我终于追查到这一点,但我不知道该从哪里出发。 我的代码有什么问题吗? 任何帮助是极大的赞赏。 提前致谢!

这是excel生成函数。

public static Boolean writeToExcel(string fileName, DataSet data) { Boolean answer = false; using (SpreadsheetDocument excelDoc = SpreadsheetDocument.Create(tempPath + fileName, SpreadsheetDocumentType.Workbook)) { WorkbookPart workbookPart = excelDoc.AddWorkbookPart(); workbookPart.Workbook = new Workbook(); WorksheetPart worksheetPart = workbookPart.AddNewPart<WorksheetPart>(); Sheets sheets = excelDoc.WorkbookPart.Workbook.AppendChild<Sheets>(new Sheets()); Sheet sheet = new Sheet() { Id = excelDoc.WorkbookPart.GetIdOfPart(worksheetPart), SheetId = 1, Name = "Page1" }; sheets.Append(sheet); CreateWorkSheet(worksheetPart, data); answer = true; } return answer; } private static void CreateWorkSheet(WorksheetPart worksheetPart, DataSet data) { Worksheet worksheet = new Worksheet(); SheetData sheetData = new SheetData(); UInt32Value currRowIndex = 1U; int colIndex = 0; Row excelRow; DataTable table = data.Tables[0]; for (int rowIndex = -1; rowIndex < table.Rows.Count; rowIndex++) { excelRow = new Row(); excelRow.RowIndex = currRowIndex++; for (colIndex = 0; colIndex < table.Columns.Count; colIndex++) { Cell cell = new Cell() { CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)), DataType = CellValues.String }; CellValue cellValue = new CellValue(); if (rowIndex == -1) { cellValue.Text = table.Columns[colIndex].ColumnName.ToString(); } else { cellValue.Text = (table.Rows[rowIndex].ItemArray[colIndex].ToString() != "") ? table.Rows[rowIndex].ItemArray[colIndex].ToString() : "*"; } cell.Append(cellValue); excelRow.Append(cell); } sheetData.Append(excelRow); } SheetFormatProperties formattingProps = new SheetFormatProperties() { DefaultColumnWidth = 20D, DefaultRowHeight = 20D }; worksheet.Append(formattingProps); worksheet.Append(sheetData); worksheetPart.Worksheet = worksheet; } 

而阅读function如下

 public static void readInventoryExcel(string fileName, ref DataSet set) { using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false)) { WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart; WorksheetPart worksheetPart = workbookPart.WorksheetParts.First(); SheetData sheetData = worksheetPart.Worksheet.Elements<SheetData>().First(); int count = -1; foreach (Row r in sheetData.Elements<Row>()) { if (count >= 0) { DataRow row = set.Tables[0].NewRow(); row["SerialNumber"] = r.ChildElements[1].InnerXml; row["PartNumber"] = r.ChildElements[2].InnerXml; row["EntryDate"] = r.ChildElements[3].InnerXml; row["RetirementDate"] = r.ChildElements[4].InnerXml; row["ReasonForReplacement"] = r.ChildElements[5].InnerXml; row["RetirementTech"] = r.ChildElements[6].InnerXml; row["IncludeInMaintenance"] = r.ChildElements[7].InnerXml; row["MaintenanceTech"] = r.ChildElements[8].InnerXml; row["Comment"] = r.ChildElements[9].InnerXml; row["Station"] = r.ChildElements[10].InnerXml; row["LocationStatus"] = r.ChildElements[11].InnerXml; row["AssetName"] = r.ChildElements[12].InnerXml; row["InventoryType"] = r.ChildElements[13].InnerXml; row["Description"] = r.ChildElements[14].InnerXml; set.Tables[0].Rows.Add(row); } count++; } } 

认为这是由于你只有一张纸而Excel有三个。 我不确定,但我认为表单是以相反的顺序返回的,所以你应该改变这一行:

 WorksheetPart worksheetPart = workbookPart.WorksheetParts.First(); 

 WorksheetPart worksheetPart = workbookPart.WorksheetParts.Last(); 

如果您可以通过工作表名称识别WorksheetPart ,可能会更安全。 您需要先find该工作Sheet然后使用该工具的Id来查找SheetPart

 private WorksheetPart GetWorksheetPartBySheetName(WorkbookPart workbookPart, string sheetName) { //find the sheet first. IEnumerable<Sheet> sheets = workbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>().Where(s => s.Name == sheetName); if (sheets.Count() > 0) { string relationshipId = sheets.First().Id.Value; WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(relationshipId); return worksheetPart; } return null; } 

然后你可以使用:

 WorksheetPart worksheetPart = GetWorksheetPartBySheetName(workbookPart, "Sheet1"); 

我注意到了一些你可能(或不可能)感兴趣的代码,

在您的代码中,您只能读取InnerXml因此对您来说可能无关紧要,但Excel存储string的方式与您写入string的方式不同,因此读取Excel生成的文件可能无法提供您期望的值。 在你的例子中,你直接把string写到这个单元格中:

单元格值的XML

但Excel使用SharedStrings概念,其中所有string都写入到名为sharedStrings.xml的单独XML文件。 该文件包含带有引用的Excel文件中使用的string,并且值存储在工作表XML中的单元格值中。

sharedString.xml如下所示:

共享字符串XML

然后细胞看起来像这样:

使用sharedString的单元格值

<v>元素中的47是对第47个共享string的引用。 请注意,生成的XML中的types( t属性)是str但是Excel生成的文件中的types是s 。 这表示你是一个内联string,他们是一个共享string。

您可以像阅读其他任何部分一样阅读SharedStrings:

 var stringTable = workbookPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); if (stringTable != null) { sharedString = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText; } 

其次,如果你看看你的代码生成的单元格引用以及Excel生成的单元格引用,则可以看到只输出列而不是行(例如,输出A而不是A1 )。 要解决这个问题,你应该改变这一行:

 CellReference = Convert.ToString(Convert.ToChar(65 + colIndex)), 

 CellReference = Convert.ToString(Convert.ToChar(65 + colIndex) + rowIndex.ToString()), 

我希望有帮助。

我碰到一个类似的问题,一会儿回来试图做这个Word文档(程序生成工作正常,但人为生成没有)。 我发现这个工具是非常有帮助的:

http://www.microsoft.com/en-us/download/details.aspx?id=30425

基本上,它会查看一个文件,并向您显示Microsoft将生成的代码,以及文件本身的xml结构。 像往常一样,微软的产品有很多菜单,并不是非常直观,但是在点击了一下之后,你将能够看到两个文件到底发生了什么。 我build议你打开一个工作的Excel文件和一个不工作的文件,并比较差异,看看是什么导致你的问题。

下面是我用来从一个Excel文件读入一个特定工作表的OpenXML代码到一个DataTable

首先,你可以这么称呼它:

 DataTable dt = OpenXMLHelper.ExcelWorksheetToDataTable("C:\\SQL Server\\SomeExcelFile.xlsx", "Mikes Worksheet"); 

这里是代码:

  public class OpenXMLHelper { public static DataTable ExcelWorksheetToDataTable(string pathFilename, string worksheetName) { DataTable dt = new DataTable(worksheetName); using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathFilename, false)) { // Find the sheet with the supplied name, and then use that // Sheet object to retrieve a reference to the first worksheet. Sheet theSheet = document.WorkbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == worksheetName).FirstOrDefault(); if (theSheet == null) throw new Exception("Couldn't find the worksheet: " + worksheetName); // Retrieve a reference to the worksheet part. WorksheetPart wsPart = (WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id)); Worksheet workSheet = wsPart.Worksheet; string dimensions = workSheet.SheetDimension.Reference.InnerText; // Get the dimensions of this worksheet, eg "B2:F4" int numOfColumns = 0; int numOfRows = 0; CalculateDataTableSize(dimensions, ref numOfColumns, ref numOfRows); System.Diagnostics.Trace.WriteLine(string.Format("The worksheet \"{0}\" has dimensions \"{1}\", so we need a DataTable of size {2}x{3}.", worksheetName, dimensions, numOfColumns, numOfRows)); SheetData sheetData = workSheet.GetFirstChild<SheetData>(); IEnumerable<Row> rows = sheetData.Descendants<Row>(); string[,] cellValues = new string[numOfColumns, numOfRows]; int colInx = 0; int rowInx = 0; string value = ""; SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart; // Iterate through each row of OpenXML data foreach (Row row in rows) { for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { // *DON'T* assume there's going to be one XML element for each item in each row... Cell cell = row.Descendants<Cell>().ElementAt(i); if (cell.CellValue == null || cell.CellReference == null) continue; // eg when an Excel cell contains a blank string // Convert this Excel cell's CellAddress into a 0-based offset into our array (eg "G13" -> [6, 12]) colInx = GetColumnIndexByName(cell.CellReference); // eg "C" -> 2 (0-based) rowInx = GetRowIndexFromCellAddress(cell.CellReference)-1; // Needs to be 0-based // Fetch the value in this cell value = cell.CellValue.InnerXml; if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString) { value = stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } cellValues[colInx, rowInx] = value; } dt.Rows.Add(dataRow); } // Copy the array of strings into a DataTable for (int col = 0; col < numOfColumns; col++) dt.Columns.Add("Column_" + col.ToString()); for (int row = 0; row < numOfRows; row++) { DataRow dataRow = dt.NewRow(); for (int col = 0; col < numOfColumns; col++) { dataRow.SetField(col, cellValues[col, row]); } dt.Rows.Add(dataRow); } #if DEBUG // Write out the contents of our DataTable to the Output window (for debugging) string str = ""; for (rowInx = 0; rowInx < maxNumOfRows; rowInx++) { for (colInx = 0; colInx < maxNumOfColumns; colInx++) { object val = dt.Rows[rowInx].ItemArray[colInx]; str += (val == null) ? "" : val.ToString(); str += "\t"; } str += "\n"; } System.Diagnostics.Trace.WriteLine(str); #endif return dt; } } private static void CalculateDataTableSize(string dimensions, ref int numOfColumns, ref int numOfRows) { // How many columns & rows of data does this Worksheet contain ? // We'll read in the Dimensions string from the Excel file, and calculate the size based on that. // eg "B1:F4" -> we'll need 6 columns and 4 rows. // // (We deliberately ignore the top-left cell address, and just use the bottom-right cell address.) try { string[] parts = dimensions.Split(':'); // eg "B1:F4" if (parts.Length != 2) throw new Exception("Couldn't find exactly *two* CellAddresses in the dimension"); numOfColumns = 1 + GetColumnIndexByName(parts[1]); // A=1, B=2, C=3 (1-based value), so F4 would return 6 columns numOfRows = GetRowIndexFromCellAddress(parts[1]); } catch { throw new Exception("Could not calculate maximum DataTable size from the worksheet dimension: " + dimensions); } } public static int GetRowIndexFromCellAddress(string cellAddress) { // Convert an Excel CellReference column into a 1-based row index // eg "D42" -> 42 // "F123" -> 123 string rowNumber = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^0-9 _]", ""); return int.Parse(rowNumber); } public static int GetColumnIndexByName(string cellAddress) { // Convert an Excel CellReference column into a 0-based column index // eg "D42" -> 3 // "F123" -> 5 var columnName = System.Text.RegularExpressions.Regex.Replace(cellAddress, "[^A-Z_]", ""); int number = 0, pow = 1; for (int i = columnName.Length - 1; i >= 0; i--) { number += (columnName[i] - 'A' + 1) * pow; pow *= 26; } return number - 1; } } 

只要提一下,我们公司的一些Excel工作表在顶部有一个或多个空行。 奇怪的是,这阻止了一些其他OpenXML库正确读取这些工作表。

此代码故意为工作表中的每个单元创build一个具有一个值的DataTable ,甚至是顶部的空白部分。