如何导入HTML格式的Excel

我已经使用HttpContext格式化表，tr和td从数据库导出数据。我想读取相同的文件并转换成数据表。

<add name="Excel03ConString" connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='HTML Import;HDR={1};IMEX=1'" /> <add name="Excel03ConString" connectionString="Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties='Excel 8.0;HDR={1};IMEX=1'" /> private DataTable GetTableFromExcel() { DataTable dt = new DataTable(); try { if (exclFileUpload.HasFile) { string FileName = Path.GetFileName(exclFileUpload.PostedFile.FileName); string Extension = Path.GetExtension(exclFileUpload.PostedFile.FileName); string FolderPath = Server.MapPath(ConfigurationManager.AppSettings["FolderPath"]); //string NewFileName = string.Format("{0}_{1}", DateTime.Now.ToString().Replace("/", "").Replace(" ", "").Replace(":", ""), FileName); string FilePath = Path.Combine(string.Format("{0}/{1}", FolderPath, FileName)); exclFileUpload.SaveAs(FilePath); string conStr = ""; switch (Extension) { case ".xls": //Excel 97-03 conStr = ConfigurationManager.ConnectionStrings["Excel03ConString"].ConnectionString; break; case ".xlsx": //Excel 07 conStr = ConfigurationManager.ConnectionStrings["Excel07ConString"].ConnectionString; break; } conStr = String.Format(conStr, FilePath, true); OleDbConnection connExcel = new OleDbConnection(conStr); OleDbCommand cmdExcel = new OleDbCommand(); OleDbDataAdapter oda = new OleDbDataAdapter(); cmdExcel.Connection = connExcel; connExcel.Open(); DataTable dtExcelSchema; dtExcelSchema = connExcel.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null); string SheetName = dtExcelSchema.Rows[0]["TABLE_NAME"].ToString(); connExcel.Close(); connExcel.Open(); cmdExcel.CommandText = "SELECT * From [" + SheetName + "]"; oda.SelectCommand = cmdExcel; oda.Fill(dt); connExcel.Close(); File.Delete(FilePath); } } catch (Exception ex) { } return dt; }

当使用第二个连接string时，我得到错误“外部表格不在connection.Open（）。”的预期格式。但是，当使用第一个时，我在读取表名时出错。

请告诉我如何阅读表格，或直接从Excel中获取数据。

我认为这第三方DLL（ExcellDataReader）可能有助于解决您的问题。

 FileStream stream = File.Open(filePath, FileMode.Open, FileAccess.Read); //1. Reading from a binary Excel file ('97-2003 format; *.xls) IExcelDataReader excelReader = ExcelReaderFactory.CreateBinaryReader(stream); //... //2. Reading from a OpenXml Excel file (2007 format; *.xlsx) IExcelDataReader excelReader = ExcelReaderFactory.CreateOpenXmlReader(stream); //... //3. DataSet - The result of each spreadsheet will be created in the result.Tables DataSet result = excelReader.AsDataSet(); //... //4. DataSet - Create column names from first row excelReader.IsFirstRowAsColumnNames = true; DataSet result = excelReader.AsDataSet(); //5. Data Reader methods while (excelReader.Read()) { //excelReader.GetInt32(0); } //6. Free resources (IExcelDataReader is IDisposable) excelReader.Close();

我在网上发现： C＃Excel文件OLEDB读取HTML IMPORT

他们在这里说：

而不是使用sheetname，您必须在没有$的select语句中使用页面标题。 SELECT * FROM [HTMLPageTitle]

在那篇文章中，他们也链接到这个手册，这个手册可能会派上用场，但是在这里复制的时间太长了： http : //ewbi.blogs.com/develops/2006/12/reading_html_ta.html

如果这不起作用，我认为你将不得不重新创build原来的Excel，所以它仍然是一个Excel文件，而不是HTML（如果这是你的场景中的所有可能的话）

您可能由于不同的原因而面临这个问题。对于这其中的一个，有不同的解决scheme是使您的解决schemedebugging为x86 。以下是如何将其更改为x86 。

右键单击从Visual Studio sloution。
点击configurationpipe理器
如果可用，请从Active solution platformselectx86
如果不可用，请单击New然后select或键入x86 ，然后单击确定。
重build解决scheme并运行您的应用程序。

如果这个解决scheme不能解决您的问题，您可能需要安装32 bit版本的office system drivers 。这是一个完整的文章解释这个问题。

经过深入的研究，我find了解决办法。

首先使用下面的代码将特定的Excel文件转换为html页面。

 File.Move(Server.MapPath("~/Foldername/ExcelName.xls",Path.ChangeExtension(Server.MapPath("~/Foldername/ExcelName.xls"), ".html"));

我们必须下载HTMLstring并提取内容。标签包含和标签，但它可能具有样式属性。所以首先我们必须避免这些样式属性，然后我们可以从表中获得所需的内容。

 string url = Server.MapPath("~/FolderName/Excelname.html"); WebClient wc = new WebClient(); string fileContent = wc.DownloadString(url);

这里我们必须格式化HTML标签以避免样式属性。

 const string msgFormat = "table[{0}], tr[{1}], td[{2}], a: {3}, b: {4}"; const string table_pattern = "<table.*?>(.*?)</table>"; const string tr_pattern = "<tr.*?>(.*?)</tr>"; const string td_pattern = "<td.*?>(.*?)</td>"; const string a_pattern = "<a href=\"(.*?)\"></a>"; const string b_pattern = "<b>(.*?)</b>";

通过循环后，我们可以find<tr>和<td>元素。然后我们可以使用这个方法在<td></td>标签内获得内容。

 private static List<string> GetContents(string input, string pattern) { MatchCollection matches = Regex.Matches(input, pattern, RegexOptions.Singleline); List<string> contents = new List<string>(); foreach (Match match in matches) contents.Add(match.Value); return contents; }

然后我们可以将导入的logging按行插入数据库。

参考链接在这里

如何导入HTML格式的Excel

VBA读取单元格值作为variables定义，而不是文本

导入XML崩溃Excel

读一个大的xls文件到R中

VBA从外部工作表导入数据 – variables工作表名称

插入大量的数据到Android的SQLite数据库？

在C＃中导出和导入.xls（x） – 表错误

Python安装xlwt模块错误

导入CSV多个范围和标题

在Excel中导入未经格式化的数据并将杂乱的值强制为列名称

从Excel导入阿拉伯数字到SQL