使用VBAfilterfunction时的性能注意事项

我无法弄清楚过滤function如何工作如此之快。 我已经在各种数据上使用filter,而不pipe数据types如何,filter消除了我使用的任何替代方法。 我经常使用二进制searchalgorithm和Stephen Bullen编写的QuickArraySortalgorithm(在Professional Excel Development中find)。 二进制search闪电般快(与Filter函数一样快,因为数组已经sorting),Quick Sortalgorithm是已知的最快的sortingalgorithm之一。

我已经写了一些testing代码,比较了在一个非常大的数组(大小= 2,000,000)中find一个随机元素的速度。 我故意以无序的方式填充数组(应该注意的是,我已经尝试了各种无序分配方法,结果是相似的,无论分配方法)。

Sub SearchTest() Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String Dim TimeBinarySearch As Long, TimeFilterSearch As Long Dim lngResultBinary As Long, lngResultFilter As Long Dim StartHour As Long, StartMinute As Long, StartSecond As Long Dim StartMiliSecond As Long, StartTime As Long Dim EndHour As Long, EndMinute As Long, EndSecond As Long Dim EndMiliSecond As Long, EndTime As Long lngSize = 2000000 strTest = CStr(1735674 * 987) ReDim strMyArray(lngSize) For i = 1 To UBound(strMyArray) If i Mod 2 = 0 Then strMyArray(i) = CStr((i - 1) * 987) Else strMyArray(i) = CStr((i + 1) * 987) End If Next i ''Filter Test '******************************************************************* StartHour = Hour(Now()) * 60 * 60 * 1000 StartMinute = Minute(Now()) * 60 * 1000 StartSecond = Second(Now()) * 1000 StartMiliSecond = Format(Now(), "ms") StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond lngResultFilter = CLng(Filter(strMyArray, strTest)(0)) EndHour = Hour(Now()) * 60 * 60 * 1000 EndMinute = Minute(Now()) * 60 * 1000 EndSecond = Second(Now()) * 1000 EndMiliSecond = Format(Now(), "ms") EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond TimeFilterSearch = EndTime - StartTime '******************************************************************* ''Binary Test '******************************************************************* StartHour = Hour(Now()) * 60 * 60 * 1000 StartMinute = Minute(Now()) * 60 * 1000 StartSecond = Second(Now()) * 1000 StartMiliSecond = Format(Now(), "ms") StartTime = StartHour + StartMinute + StartSecond + StartMiliSecond QuickSortString1D strMyArray lngResultBinary = strMyArray(CLng(BinarySearchString(strTest, strMyArray))) EndHour = Hour(Now()) * 60 * 60 * 1000 EndMinute = Minute(Now()) * 60 * 1000 EndSecond = Second(Now()) * 1000 EndMiliSecond = Format(Now(), "ms") EndTime = EndHour + EndMinute + EndSecond + EndMiliSecond TimeBinarySearch = EndTime - StartTime '******************************************************************* MsgBox lngResultFilter & vbCr & vbCr & lngResultBinary MsgBox TimeFilterSearch & vbCr & vbCr & TimeBinarySearch End Sub 

两种方法都返回相同的结果,但Filter方法的返回时间为0 ms,QuickSort / BinarySearch方法的返回时间接近20秒。 这是一个巨大的差异! 如前所述,如果对数组进行sorting,则二进制search方法也会返回0毫秒(众所周知,二进制search要求对数组进行sorting以开始)

那么,Filter函数怎样才能查看200万个未sorting的条目并立即find正确的结果呢? 它不能简单地遍历每个条目,并将其与过滤值(这是迄今为止最慢的方法)进行比较,但是基于这些初步testing,它不能利用二进制search,因为它必须对数组第一。 即使有一个已经编译好的sortingalgorithm,我也很难相信它能够瞬间sorting超过一百万的数组。

顺便说一下,下面是QuickSortalgorithm和二进制searchalgorithm。

  Sub QuickSortString1D(ByRef saArray() As String, _ Optional ByVal bSortAscending As Boolean = True, _ Optional ByVal lLow1 As Variant, _ Optional ByVal lHigh1 As Variant) 'Dimension variables Dim lLow2 As Long Dim lHigh2 As Long Dim sKey As String Dim sSwap As String On Error GoTo ErrorExit 'If not provided, sort the entire array If IsMissing(lLow1) Then lLow1 = LBound(saArray) If IsMissing(lHigh1) Then lHigh1 = UBound(saArray) 'Set new extremes to old extremes lLow2 = lLow1 lHigh2 = lHigh1 'Get value of array item in middle of new extremes sKey = saArray((lLow1 + lHigh1) \ 2) 'Loop for all the items in the array between the extremes Do While lLow2 < lHigh2 If bSortAscending Then 'Find the first item that is greater than the mid-point item Do While saArray(lLow2) < sKey And lLow2 < lHigh1 lLow2 = lLow2 + 1 Loop 'Find the last item that is less than the mid-point item Do While saArray(lHigh2) > sKey And lHigh2 > lLow1 lHigh2 = lHigh2 - 1 Loop Else 'Find the first item that is less than the mid-point item Do While saArray(lLow2) > sKey And lLow2 < lHigh1 lLow2 = lLow2 + 1 Loop 'Find the last item that is greater than the mid-point item Do While saArray(lHigh2) < sKey And lHigh2 > lLow1 lHigh2 = lHigh2 - 1 Loop End If 'If the two items are in the wrong order, swap the rows If lLow2 < lHigh2 Then sSwap = saArray(lLow2) saArray(lLow2) = saArray(lHigh2) saArray(lHigh2) = sSwap End If 'If the pointers are not together, advance to the next item If lLow2 <= lHigh2 Then lLow2 = lLow2 + 1 lHigh2 = lHigh2 - 1 End If Loop 'Recurse to sort the lower half of the extremes If lHigh2 > lLow1 Then QuickSortString1D saArray, bSortAscending, lLow1, lHigh2 End If 'Recurse to sort the upper half of the extremes If lLow2 < lHigh1 Then QuickSortString1D saArray, bSortAscending, lLow2, lHigh1 End If ErrorExit: End Sub '*********************************************************** ' Comments: Uses a binary search algorithm to quickly locate ' a string within a sorted array of strings ' ' Arguments: sLookFor The string to search for in the array ' saArray An array of strings, sorted ascending ' lMethod Either vbBinaryCompare or vbTextCompare ' Defaults to vbTextCompare ' lNotFound The value to return if the text isn't ' found. Defaults to -1 ' ' Returns: Long The located position in the array, ' or lNotFound if not found ' ' Date Developer Action ' ———————————————————————————————- ' 02 Jun 04 Stephen Bullen Created ' Function BinarySearchString(ByRef sLookFor As String, _ ByRef saArray() As String, _ Optional ByVal lMethod As VbCompareMethod = vbTextCompare, _ Optional ByVal lNotFound As Long = -1) As Long Dim lLow As Long Dim lMid As Long Dim lHigh As Long Dim lComp As Long On Error GoTo ErrorExit 'Assume we didn't find it BinarySearchString = lNotFound 'Get the starting positions lLow = LBound(saArray) lHigh = UBound(saArray) Do 'Find the midpoint of the array lMid = (lLow + lHigh) \ 2 'Compare the mid-point element to the string being searched for lComp = StrComp(saArray(lMid), sLookFor, lMethod) If lComp = 0 Then 'We found it, so return the location and quit BinarySearchString = lMid Exit Do ElseIf lComp = 1 Then 'The midpoint item is bigger than us - throw away the top half lHigh = lMid - 1 Else 'The midpoint item is smaller than us - throw away the bottom half lLow = lMid + 1 End If 'Continue until our pointers cross Loop Until lLow > lHigh ErrorExit: End Function 

编辑:看来我应该先做一些“蛮力”的testing。 通过简单地以线性方式遍历数组,John Colemanbuild议Filter函数执行,上述相同场景的返回时间为0 ms。 见下文:

 Sub Test3() Dim i As Long, strMyArray() As String, lngSize As Long, strTest As String Dim lngResultBrute As Long, TimeBruteSearch As Long lngSize = 2000000 strTest = CStr(936740 * 97) ReDim strMyArray(lngSize) For i = 1 To UBound(strMyArray) If i Mod 2 = 0 Then strMyArray(i) = CStr((i - 1) * 97) Else strMyArray(i) = CStr((i + 1) * 97) End If Next i StartTime = Timer ' Brute force search For i = 1 To UBound(strMyArray) If strMyArray(i) = strTest Then lngResultBrute = CLng(strTest) Exit For End If Next i EndTime = Timer TimeBruteSearch = EndTime - StartTime MsgBox TimeBruteSearch End Sub 

Filter器使用线性search – 它只是快速执行,因为它是在高度优化的C / C ++代码中实现的。 要看到这,请运行以下代码:

 Function RandString(n As Long) As String 'returns a random string in BZ Dim i As Long Dim s As String For i = 1 To n s = s & Chr(66 + Int(25 * Rnd())) Next i RandString = s End Function Sub test() Dim times(1 To 20) As Double Dim i As Long, n As Long Dim A() As String Dim start As Double Dim s As String Randomize s = RandString(99) ReDim A(1 To 2000000) For i = 1 To 2000000 A(i) = s + RandString(1) Next i s = s & "A" For i = 20 To 1 Step -1 n = i * 100000 ReDim Preserve A(1 To n) start = Timer Debug.Print UBound(Filter(A, s)) 'should be -1 times(i) = Timer - start Next i For i = 1 To 20 Cells(i, 1) = i Cells(i, 2) = times(i) Next i End Sub 

此代码创build一个由2,000,000个随机长度为100的string组成的数组,每个string与最后一个位置的目标string不同。 然后,将大小为100,000的倍数的子arraysinputFilter ,计算所花费的时间。 输出如下所示:

在这里输入图像说明

清晰的线性趋势并不完全certificate,但是VBA的Filter正在执行直接线性search的有力证据。

我相信你在这里比较苹果和橘子。 它看起来像当你testingFilterfunction,你把一个无序的数组作为input,然后使用Filter来查找匹配的testing值。 直觉说O(N)= 200万个操作—你testing每个数组元素一次。 那你就完蛋了

当你使用自定义VBA函数进行筛选时,首先sorting ,这是相当昂贵的O(N * Log2(N))= 29百万。 一旦数组被sorting后,您将获得search有序数组的好处,即O(Log2(N))= 14.即使您加快search速度,sorting的处罚也将会消失。

希望有所帮助。