从date范围中剥离EPPlus输出中的数据

快速浏览:主要目标是从行中读取设定date的数据,并从设定的date(例如开始date)获取参考编号。

例如,如果我只是想从date设置的数据上个月1日以上。

我目前不得不从下面的Excel电子表格中提取一些数据:

Start date Ref number 29/07/2015 2342326 01/07/2016 5697455 02/08/2016 3453787 02/08/2016 5345355 02/08/2015 8364456 03/08/2016 1479789 04/07/2015 9334578 

在这里输入图像说明

使用EPPlus输出:

 29/07/2015 2342326 29/07/2016 5697455 02/08/2016 3453787 02/08/2016 5345355 02/08/2015 8364456 03/08/2016 1479789 04/07/2015 9334578 

这部分是好的,但是当我尝试通过date范围去除输出我得到错误,例如使用LINQ我得到以下错误输出。

 An unhandled exception of type 'System.InvalidCastException' occurred in System.Data.DataSetExtensions.dll Additional information: Specified cast is not valid. 

LINQ代码:

 var rowsOfInterest = tbl.AsEnumerable() .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) .ToList(); 

我也尝试使用数据表修改date范围:

 DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#"); 

但是得到以下错误:

 An unhandled exception of type 'System.Data.EvaluateException' occurred in System.Data.dll Additional information: Cannot perform '>=' operation on System.String and System.Double. 

最后一个尝试是尝试,看看我是否可以从循环内删除date。

使用的代码:

 DateTime dDate; row[cell.Start.Column - 1] = cell.Text; string dt = cell.Text.ToString(); if (DateTime.TryParse(dt, out dDate)) { DateTime dts = Convert.ToDateTime(dt); } DateTime date1 = new DateTime(2016, 7, 1); if (dDate >= date1) { Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); } 

这种作品,但只列出设置的date,而不是价值观,这是可以理解的,如果我采取这个路线,我将如何得到有值的date?

输出:

 29/07/2016 02/08/2016 02/08/2016 03/08/2016 

使用完整的代码示例:

 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Data.OleDb; using System.Text.RegularExpressions; using Microsoft.Office.Interop.Excel; using System.Data; using System.IO; namespace Number_Cleaner { public class NumbersReport { //ToDo: Look in to fixing the code so it filters the date correctly with the right output data. public System.Data.DataTable GetDataTableFromExcel(string path, bool hasHeader = true) { using (var pck = new OfficeOpenXml.ExcelPackage()) { using (var stream = File.OpenRead(path)) { pck.Load(stream); } var ws = pck.Workbook.Worksheets.First(); System.Data.DataTable tbl = new System.Data.DataTable(); foreach (var firstRowCell in ws.Cells[1, 1, 1, ws.Dimension.End.Column]) { tbl.Columns.Add(hasHeader ? firstRowCell.Text : string.Format("Column {0}", firstRowCell.Start.Column)); } var startRow = hasHeader ? 2 : 1; for (int rowNum = startRow; rowNum <= ws.Dimension.End.Row; rowNum++) { var wsRow = ws.Cells[rowNum, 1, rowNum, ws.Dimension.End.Column]; DataRow row = tbl.Rows.Add(); foreach (var cell in wsRow) { DateTime dDate; row[cell.Start.Column - 1] = cell.Text; string dt = cell.Text.ToString(); //Console.WriteLine(dt); if (DateTime.TryParse(dt, out dDate)) { DateTime dts = Convert.ToDateTime(dt); } DateTime date1 = new DateTime(2016, 7, 1); if (dDate >= date1) { Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); } //Console.WriteLine(row[cell.Start.Column - 1] = cell.Text); } } //var rowsOfInterest = tbl.AsEnumerable() // .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) //.ToList(); //Console.WriteLine(tbl); //DataRow[] result = tbl.Select("'Start date' >= #1/7/2016#"); return tbl; } } 

修改自: 如何将date与行匹配,然后使用EPPlus获取最终列值?

根据你的代码,你通过调用cell.Text把所有的东西都存储在你的DataTable中。 但是使用这种方式,您将失去有价值的信息 – 单元数据types。 使用cell.Value可以是string ,也可以是double 。 使用Excel,date,整数和小数值都被存储为doubles值。

您所看到的错误与您将值存储为string的事实有关,但在这里查询它们就像DateTime

 .Where(row => row.Field<DateTime>("Start date") >= new DateTime(2016, 7, 1)) 

和这里:

 "'Start date' >= #1/7/2016#" 

如果你在这里看我的post: 如何使用EPPlus将excel行parsing回types,你会看到帮助函数ConvertSheetToObjects ,它几乎处理你正在做的事情。 稍作修改,我们可以把它变成一个需要一个WorkSheet并把它转换成一个DataTable 。 就像对象转换方法一样,您仍然应该在DataTable传递它,然后让它通过投射单元格值来猜测它。

 public static void ConvertSheetToDataTable(this ExcelWorksheet worksheet, ref DataTable dataTable) { //DateTime Conversion var convertDateTime = new Func<double, DateTime>(excelDate => { if (excelDate < 1) throw new ArgumentException("Excel dates cannot be smaller than 0."); var dateOfReference = new DateTime(1900, 1, 1); if (excelDate > 60d) excelDate = excelDate - 2; else excelDate = excelDate - 1; return dateOfReference.AddDays(excelDate); }); //Get the names in the destination TABLE var tblcolnames = dataTable .Columns .Cast<DataColumn>() .Select(dcol => new {Name = dcol.ColumnName, Type = dcol.DataType}) .ToList(); //Cells only contains references to cells with actual data var cellGroups = worksheet.Cells .GroupBy(cell => cell.Start.Row) .ToList(); //Assume first row has the column names and get the names of the columns in the sheet that have a match in the table var colnames = cellGroups .First() .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx }) .Where(o => tblcolnames.Select(tcol => tcol.Name).Contains(o.Name)) .ToList(); //Add the rows - skip the first cell row for (var i = 1; i < cellGroups.Count(); i++) { var cellrow = cellGroups[i].ToList(); var tblrow = dataTable.NewRow(); dataTable.Rows.Add(tblrow); colnames.ForEach(colname => { //Excel stores either strings or doubles var cell = cellrow[colname.index]; var val = cell.Value; var celltype = val.GetType(); var coltype = tblcolnames.First(tcol => tcol.Name == colname.Name).Type; //If it is numeric it is a double since that is how excel stores all numbers if (celltype == typeof(double)) { //Unbox it var unboxedVal = (double)val; //FAR FROM A COMPLETE LIST!!! if (coltype == typeof (int)) tblrow[colname.Name] = (int) unboxedVal; else if (coltype == typeof (double)) tblrow[colname.Name] = unboxedVal; else throw new NotImplementedException($"Type '{coltype}' not implemented yet!"); } else if (coltype == typeof (DateTime)) { //Its a date time tblrow[colname.Name] = val; } else if (coltype == typeof (string)) { //Its a string tblrow[colname.Name] = val; } else { throw new DataException($"Cell '{cell.Address}' contains data of type {celltype} but should be of type {coltype}!"); } }); } } 

要使用这样的东西:

在这里输入图像说明

你会运行这个:

 [TestMethod] public void Sheet_To_Table_Test() { //https://stackoverflow.com/questions/38915006/stripping-data-from-a-epplus-output-from-a-date-range //Create a test file var fi = new FileInfo(@"c:\temp\Sheet_To_Table.xlsx"); using (var package = new ExcelPackage(fi)) { var workbook = package.Workbook; var worksheet = workbook.Worksheets.First(); var datatable = new DataTable(); datatable.Columns.Add("Col1", typeof(int)); datatable.Columns.Add("Col2", typeof(string)); datatable.Columns.Add("Col3", typeof(double)); datatable.Columns.Add("Col4", typeof(DateTime)); worksheet.ConvertSheetToDataTable(ref datatable); foreach (DataRow row in datatable.Rows) Console.WriteLine( $"row: {{Col1({row["Col1"].GetType()}): {row["Col1"]}" + $", Col2({row["Col2"].GetType()}): {row["Col2"]}" + $", Col3({row["Col3"].GetType()}): {row["Col3"]}" + $", Col4({row["Col4"].GetType()}):{row["Col4"]}}}"); //To Answer OP's questions datatable .Select("Col4 >= #01/03/2016#") .Select(row => row["Col1"]) .ToList() .ForEach(num => Console.WriteLine($"{{{num}}}")); } } 

在输出中给出了这个:

 row: {Col1(System.Int32): 12345, Col2(System.String): sf, Col3(System.Double): 456.549, Col4(System.DateTime):1/1/2016 12:00:00 AM} row: {Col1(System.Int32): 456, Col2(System.String): asg, Col3(System.Double): 165.55, Col4(System.DateTime):1/2/2016 12:00:00 AM} row: {Col1(System.Int32): 8, Col2(System.String): we, Col3(System.Double): 148.5, Col4(System.DateTime):1/3/2016 12:00:00 AM} row: {Col1(System.Int32): 978, Col2(System.String): wer, Col3(System.Double): 668.456, Col4(System.DateTime):1/4/2016 12:00:00 AM} {8} {978}