如何在每半小时后取得最近的date

我有一个非常大的数据集,看起来像这样

Column A Date 2016-02-29 15:59:59.674 2016-02-29 15:59:59.695 2016-02-29 15:59:59.716 2016-02-29 15:59:59.752 2016-02-29 15:59:59.804 2016-02-29 15:59:59.869 2016-02-29 15:59:59.888 2016-02-29 15:59:59.941 2016-02-29 16:00:00.081 <-- get closest date since .081 < .941 2016-02-29 16:00:00.168 2016-02-29 16:00:00.189 2016-02-29 16:00:00.198 2016-02-29 16:00:00.247 2016-02-29 16:00:00.311 2016-02-29 16:00:00.345 2016-02-29 16:00:00.357 and for the other half an hour 2016-02-29 16:29:58.628 2016-02-29 16:29:58.639 2016-02-29 16:29:58.689 2016-02-29 16:29:58.706 2016-02-29 16:29:58.761 2016-02-29 16:29:58.865 2016-02-29 16:29:59.142 2016-02-29 16:29:59.542 2016-02-29 16:29:59.578 2016-02-29 16:30:00.171 <-- Get this date since .171 < .578 2016-02-29 16:30:00.209 2016-02-29 16:30:00.217 2016-02-29 16:30:00.245 2016-02-29 16:30:00.254 2016-02-29 16:30:00.347 2016-02-29 16:30:00.422 2016-02-29 16:30:00.457 2016-02-29 16:30:00.491 2016-02-29 16:30:00.555 2016-02-29 16:30:00.557 2016-02-29 16:30:00.645 

现在数据集中的总行数大约是5468389,这对excel来说是非常大的,以便将所有内容导入到一列中,所以我正在尝试处理部分数据。

有没有其他办法呢? 我可以通过它处理所有的数据? 我试图直接阅读和写入文本,但每当我试图阅读它作为date,它给了我一个Type Mismatch错误,因为格式。 出于同样的原因,我没有去与这个问题的python也因为我不精通python,所以我想在Excel VBA中这样做。

此外,我不太清楚这个逻辑,所以我需要一些帮助。

 Option Explicit Sub Get_Closest_Dates() Application.ScreenUpdating = False Dim WI As Worksheet, WO As Worksheet Dim i As Long, ct As Long Dim num1 As Integer, num2 As Integer, num3 As Integer Dim df1, df2 Set WI = Sheet1 'INPUT SHEET Set WO = Sheet2 'OUTPUT SHEET WI.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS" WO.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS" WI.Range("B1") = "HOUR" WI.Range("C1") = "MINUTE" With WI .Range("B2").Formula = "=HOUR(A2)" .Range("B2:B" & Rows.Count).FillDown .Range("C2").Formula = "=MINUTE(A2)" .Range("C2:C" & Rows.Count).FillDown ct = WO.Range("A" & Rows.Count).End(xlUp).Row + 1 For i = 2 To 10000 num1 = .Range("C" & i).Value 'get Minutes num2 = .Range("C" & i + 1).Value If (num1 = 29 And num2 = 30) Then df1 = 0.5 - TimeValue(.Range("A" & i)) df2 = TimeValue(.Range("A" & i + 1)) - 0.5 If df1 < df2 Then WO.Range("A" & ct) = .Range("A" & i) ct = ct + 1 Else WO.Range("A" & ct) = .Range("A" & i + 1) ct = ct + 1 End If End If If (num1 = 59 And num2 = 0) Then df1 = 1 - TimeValue(.Range("A" & i)) df2 = TimeValue(.Range("A" & i + 1)) - 1 If df1 < df2 Then WO.Range("A" & ct) = .Range("A" & i) ct = ct + 1 Else WO.Range("A" & ct) = .Range("A" & i + 1) ct = ct + 1 End If End If Next i End With Application.ScreenUpdating = True MsgBox "Process Completed" End Sub 

此外,我不知道如何从避免计算两个date的差异的date获得毫秒部分

像15:59:59.674我怎么能从674时间?

看起来像你的第一个问题是获取数据到Excel中。 了解Excel可能不是处理如此大量数据的最佳程序(如Access等DB程序可能更好),则需要将数据拆分为多个列或工作表; 或者抽取一些数据。

你select了一个样本,所以我会在读取数据的时候进行抽样和testing。

您还必须在处理包含毫秒的date/时间戳时处理Excel / VBA限制。

但是为了testing数据,不需要关心毫秒。 只要你的数据是以升序排列,那么第一行的date/时间标记是等于或大于30分钟的增量就是最早的一行。

下面的代码应该只读取您的大文件符合条件的行。 请阅读评论的额外信息。

线被收集到一个集合; 然后声明,填充结果数组,并将结果写入工作表。

如果每一行由多个字段组成,而不仅仅是显示的单一行,那么在编写结果时,您将声明结果数组来保存所有列,并在那个时候填充它。

使用集合/数组/写入工作表序列将比在处理工作表时每行写入一行更快。

有一些方法可以加快代码的速度,还有一些方法可以处理可能的“内存不足”错误,但这取决于您的真实数据以及这个简单代码的情况。

就目前而言,我们需要将date/时间戳转换为“真实”date/时间,这取决于您希望对后续数据执行什么操作。

==========================================

 Option Explicit 'Set Reference to Microsoft Scripting Runtime Sub GetBigData() Dim FSO As FileSystemObject Dim TS As TextStream Dim vFileName As Variant Dim sLine As String Dim dtLineTime As Date Dim dtNextTime As Date Dim colLines As Collection vFileName = Application.GetOpenFilename("Text Files(*.txt), *.txt") If vFileName = False Then Exit Sub Set FSO = New FileSystemObject Set TS = FSO.OpenTextFile(vFileName, ForReading, False, TristateFalse) Set colLines = New Collection With TS 'Assumes date/time stamps are contiguous 'skip any header lines Do sLine = .ReadLine Loop Until InStr(sLine, ".") > 0 'Compute first "NextTime" ' note that it might be the first entry ' comment line 3 below if want first entry ' but would need to add logic if using other time increments dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1)) dtNextTime = Int(dtLineTime) + TimeSerial(Hour(dtLineTime), Int(Minute(dtLineTime) / 30) * 30, 0) If Not (Minute(dtLineTime) = 30 Or Minute(dtLineTime) = 60) Then dtNextTime = dtNextTime + TimeSerial(0, 30, 0) Do 'Due to IEEE rounding problems, need to test equality as a very small value 'Could use a value less than 1 second = 1/86400 or smaller If Abs(dtLineTime - dtNextTime) < 0.00000001 Or _ dtLineTime > dtNextTime Then colLines.Add sLine dtNextTime = dtNextTime + TimeSerial(0, 30, 0) End If If Not .AtEndOfStream Then sLine = .ReadLine dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1)) End If Loop Until .AtEndOfStream .Close End With 'Write the collection to the worksheet Dim V As Variant Dim wsResults As Worksheet, rResults As Range Dim I As Long Set wsResults = Worksheets("sheet1") Set rResults = wsResults.Cells(1, 1) ReDim V(1 To colLines.Count, 1 To 1) Set rResults = rResults.Resize(UBound(V, 1), UBound(V, 2)) For I = 1 To UBound(V, 1) V(I, 1) = CStr(colLines(I)) Next I With rResults .EntireColumn.Clear .NumberFormat = "@" .Value = V .EntireColumn.AutoFit End With End Sub 

==========================================

编辑添加时间戳转换function。 这可以在将数据从集合对象复制到variables数组的位置实现。 例如:

 V(I, 1) = ConvertTimeStamp(colLines(I)) 

由于收到的值是Double数据types,因此您还需要在工作表上适当地格式化该列,而不是将其作为Text:

 .NumberFormat = "yyyy-mm-dd hh:mm:ss.000" 

由于VBAdatetypes数据不支持毫秒,所以我们必须以doubleforms返回值。

==============================

 Private Function ConvertTimeStamp(sTmStmp As String) As Double Dim dtPart As Date Dim dMS As Double 'milliseconds Dim V As Variant 'Convert the date and time V = Split(sTmStmp, ".") dtPart = CDate(V(0)) dMS = V(1) ConvertTimeStamp = dtPart + dMS / 86400 / 1000 End Function 

==============================

如果您反转sorting顺序,则可以使用“匹配”function查找列表中刚好大于(紧接)特定时间的条目的索引。 就像是:

= MATCH(HalfHourValue,RangeContainingTimes,-1)

你必须扭转秩序; 它给你的索引,而不是实际的价值。

要获得刚刚find的条目的毫秒数,应该像下面这样工作:

= RIGHT(TEXT(INDEX(RangeContainingTimes,IxFromAbove,1), “HH:MM:ss.000”),3)