使用vb.net在大型excel文件上执行SQL Query的最佳方法是什么?

安装环境:

我正在开发一个Excel 2010应用程序级插件,使用.NET框架4使用vb.net。

我的目标:

  1. 让用户键入多个名称进行search
  2. 使用名称列表在大电子表格上执行SQL查询(30,000多行)
  3. 返回logging集并粘贴到新的工作表中

性能是我的首要任务。 我想知道利用.NET框架来做到这一点的最快方法。

在我的代码中使用ADO连接对象的工作,但过程需要太长时间(5 – 8秒)。

这是我在名为wells的表上使用的SQL查询:

SELECT * FROM wells WHERE padgroup in (SELECT padgroup FROM wells WHERE name LIKE 'TOMCHUCK 21-30' OR name LIKE 'FEDERAL 41-25PH') 

以下是表格的一部分:

Excel表格

我现在使用这个代码来创build一个ADO连接对象来检索我的结果:

  'Create Recordset Object rsCon = CreateObject("ADODB.Connection") rsData = CreateObject("ADODB.Recordset") rsCon.Open(szConnect) rsData.Open(mySQLQueryToExecute, rsCon, 0, 1, 1) 'Check to make sure data is received, then copy the data If Not rsData.EOF Then TargetRange.Cells(1, 1).CopyFromRecordset(rsData) Else MsgBox("No records returned from : " & SourceFile, vbCritical) End If 'Clean up the Recordset object rsData.Close() rsData = Nothing rsCon.Close() rsCon = Nothing 

根据我所知,Excel电子表格是以Open XML格式存储的,.NET框架包含了对XMLparsing的本地支持。

经过研究,我遇到了几个不同的select:

  • 打开XML SDK
  • 简单的XML(SAX)API
  • LINQ to SQL

有人可以提供什么是最好的方法使用指针? 我真的很感激。

补充笔记:

  • 所有查询都需要能够在不连接到在线数据库的情况下执行
  • 我只需要访问电子表格一次从行中提取原始数据

现在我只是将电子表格embedded到项目资源中。

然后,在运行时创build文件,运行查询,将结果存储在内存中,然后删除文件。

  'Create temp file path in the commonapplicationdata folder Dim excelsheetpath As StringBuilder excelsheetpath = New StringBuilder(Environment.GetFolderPath(Environment.SpecialFolder.CommonApplicationData)) excelsheetpath.Append("\MasterList.xlsm") 'Save resources into temp location in HD System.IO.File.WriteAllBytes(excelsheetpath.ToString, My.Resources.MasterList) 'Now call the function to use ADO to get records from the MasterList.xlsm file just created GetData(excelsheetpath.ToString, "Sheet1", "A1:S40000", True, False) 'Store the results in-memory and display by adding to a datagridview control (in a custom task pane) 'Delete the spreadsheet System.IO.File.Delete(excelsheetpath.ToString()) 

你正在做VSTO错误的方式;)不要使用SQL与Excel。 如果您需要速度,则可以利用VSTO和本机Excel API。 您可以跳过ADODB / OLEDB图层的开销,直接进入Excel对象模型,在Excel中使用快速的Autofilter, SpecialCells方法只将可见的单元格放入多区域范围, Value方法快速复制一个数组的范围。

以下是VSTO 2010自定义工作手册样本,可以快速search包含“aba”,“cat”或“zon”的单词的58k字列表 。

 using System; using System.Collections.Generic; using System.Data; using System.Linq; using System.Text; using System.Windows.Forms; using System.Xml.Linq; using Microsoft.Office.Tools.Excel; using Microsoft.VisualStudio.Tools.Applications.Runtime; using Excel = Microsoft.Office.Interop.Excel; using Office = Microsoft.Office.Core; namespace ExcelWorkbook1 { public partial class ThisWorkbook { private void ThisWorkbook_Startup(object sender, System.EventArgs e) { const int Sheet1 = 1; // you can use Linq to find a sheet by name if needed const int ColumnB = 2; List<List<object>> results = Query(Sheet1, ColumnB, "aba", "cat", "zon"); foreach (List<object> record in results) { System.Diagnostics.Debug.Print("{0,-10} {1,30} {2}", record[0], record[1], record[2]); } } private void ThisWorkbook_Shutdown(object sender, System.EventArgs e) { } /// <summary> /// Removes any existing Excel autofilters from the worksheet /// </summary> private void ClearFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet) { if (worksheet.AutoFilter != null) { worksheet.Cells.AutoFilter(); } } /// <summary> /// Applies an Excel Autofilter to the worksheet for search for an array of substring predicates /// </summary> private void ApplyFilter(Microsoft.Office.Interop.Excel._Worksheet worksheet, int column, params string[] predicates) { string[] criteria = new string[predicates.Length]; int i = 0; ClearFilter(worksheet); foreach (string value in predicates) { criteria[i++] = String.Concat("=*", value, "*"); } worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr); } /// <summary> /// Returns a list of rows that are hits on a search for an array of substrings in Column B of Sheet1 /// </summary> private List<List<object>> Query(int sheetIndex, int columnIndex, params string[] words) { Microsoft.Office.Interop.Excel._Worksheet worksheet; Excel.Range range; List<List<object>> records = new List<List<object>>(); List<object> record; object[,] cells; object value; int row, column, rows, columns; bool hit; try { worksheet = (Microsoft.Office.Interop.Excel._Worksheet)Globals.ThisWorkbook.Sheets[sheetIndex]; if (null == worksheet) { return null; } // apply the autofilter ApplyFilter(worksheet, columnIndex, words); // get the range = worksheet.Range["$A:$C"].SpecialCells(Excel.XlCellType.xlCellTypeVisible); foreach (Excel.Range subrange in range.Areas) { // copy the cells to a multidimensional array for perfomance cells = subrange.Value; // transform the multidimensional array to a List for (row = cells.GetLowerBound(0), rows = cells.GetUpperBound(0); row <= rows; row++) { record = new List<object>(); hit = false; for (column = cells.GetLowerBound(1), columns = cells.GetUpperBound(1); column <= columns; column++) { value = cells[row, column]; hit = hit || (null != value); if (hit) { record.Add(cells[row, column]); } } if (hit) { records.Add(record); } } } } catch { } finally { // use GC.Collect over Marshal.ReleaseComObject() to release all RCWs per http://stackoverflow.com/a/17131389/1995977 and more cells = null; GC.Collect(); GC.WaitForPendingFinalizers(); } return records; } #region VSTO Designer generated code /// <summary> /// Required method for Designer support - do not modify /// the contents of this method with the code editor. /// </summary> private void InternalStartup() { this.Startup += new System.EventHandler(ThisWorkbook_Startup); this.Shutdown += new System.EventHandler(ThisWorkbook_Shutdown); } #endregion } } 

Excel 2010文件不是很XML。 取一个XLSX(或XMSM)文件,并用.zip扩展名重新命名。 然后解压缩到一个新的文件夹。 子文件夹中的文件将是XML文件,但是,实际的XLSX文件是包含包含XML文件的文件夹集合的zip文件。

我认为最好的select是使用ACE驱动程序(JET不再受支持),并通过ODBC访问它。 如果速度不够快,您可能可以在某些时间提取数据,并将数据上传到可以运行查询的数据库; 查询应该更快,但可能会过时。

我的解决scheme

我尝试了三种不同的方法:

  • ADO连接对象与SQL(最慢)
  • VSTO和Excel的Autofilter(可靠)
  • LINQ to XML(最快)

LINQ to XML提供了最好的性能。 我把我的表格转换成一个XML文件:

XML表

然后,在我的代码中,我使用StringReader来引入XMLwellData文件(将其保存为项目资源)。

  'welldoc will be the file to do queries on using LINQ to XML Dim stream As System.IO.StringReader stream = New StringReader(My.Resources.XMLwellData) welldoc = XDocument.Load(stream) 'clean up stream now that it's no longer needed stream.Close() stream.Dispose() '***** later in the code perform my query on XML file ********* Dim query = _ From well In welldoc.<wellList>.<well> _ Where well.<name>.Value Like "TOMCHUCK 21-30" _ Select well For Each well in query MessageBox.Show(well.<padgroup>.value) Next 

这很简单,做我想做的事情,最重要的是快速。

感谢您的帮助和build议。 这让我意识到这一点。

使用Excel的自动filter的替代方法

如果您尝试使用其他答案中build议的代码,则这只会过滤两个值:

  worksheet.Cells.AutoFilter(column, criteria, Excel.XlAutoFilterOperator.xlOr); 

因此,要使用Excel的Auotfilter过滤多个条件 ,您必须将您的参数作为数组传递,并在xlFilterValues上进行过滤。

  Dim wrkbk As Excel.Workbook Dim wrksht As Excel.Worksheet Dim myRange As Excel.Range Dim cell As Excel.Range 'You would add all of your wellnames to search to this List Dim wellNames As New List(Of String) wrksht = wrkbk.Sheets(1) 'In my excel file there is a Named Range which includes the all the information myRange = wrksht.Range("WellLookUpTable") 'Notice, for `Criteria1:=` you MUST convert the List to an array With wrksht.Range("WellLookUpTable") .AutoFilter(Field:=2, Criteria1:=wellNames.ToArray, Operator:=Excel.XlAutoFilterOperator.xlFilterValues) End With myRange = wrksht.Range("A2", wrksht.Range("A2").End(Excel.XlDirection.xlDown)).Cells.SpecialCells(Excel.XlCellType.xlCellTypeVisible) For Each cell In myRange 'column 11 is padgroup MessageBox.Show(cell.Offset(0, 11).Value) Next