search集合

我正在处理一个相当大的数据集(> 100,000行),试图比较两个列表,找出新列表中的哪些项目不在主列表中。 换句话说,我想find新的独特的项目。

我有一些使用vlookup和数组的VBA代码工作,但当数组变得太大(〜70,000)时炸弹。 所以我转向collections。 然而,我很难search使用vlookup或匹配的集合。

Sub find_uniqueIDs() Dim a As Long Dim n As Long Dim m As Variant Dim oldnum As Long Dim oldIDs As Variant Dim oldcoll As New Collection Dim newnum As Long Dim newIDs As Variant Dim newcoll As New Collection oldnum = 75000 oldIDs = Range("A1", Range("A" & oldnum)) newnum = 45000 + 3 newIDs = Range("G3", Range("G" & newnum)) 'Using arrays to search, but bombs out when oldnum or newnum are ~70000 For n = 1 To newnum - 3 m = Application.VLookup(newIDs(n, 1), oldIDs, 1, False) If IsError(m) Then Range("E100000").End(xlUp).Offset(1, 0) = newIDs(n, 1) Next n 'Using collections to search For n = 1 To oldnum On Error Resume Next oldcoll.Add oldIDs(n, 1) On Error GoTo 0 Next n For m = 1 To newnum On Error Resume Next newcoll.Add newIDs(m, 1) On Error GoTo 0 Next m 'This bit of code doesn't work For a = 1 To newcoll.Count If Application.VLookup(newcoll(a), oldcoll, 1, False) = "#N/A" Then _ Range("E100000").End(xlUp).Offset(1, 0) = newcoll(a) Next a End Sub 

任何想法如何确定一个特定的项目是否在使用集合的主列表中?

这里是一个简短的子演示一些脚本字典的方法。

 Sub list_New_Unique() Dim dMASTER As Object, dNEW As Object, k As Variant Dim v As Long, vVALs() As Variant, vNEWs() As Variant Debug.Print "Start: " & Timer Set dMASTER = CreateObject("Scripting.Dictionary") Set dNEW = CreateObject("Scripting.Dictionary") dMASTER.comparemode = vbTextCompare dNEW.comparemode = vbTextCompare With Worksheets("Sheet7") vVALs = .Range("A2:A100000").Value2 vNEWs = .Range("C2:C100000").Value2 End With 'populate the dMASTER values For v = LBound(vVALs, 1) To UBound(vVALs, 1) dMASTER.Add Key:=vVALs(v, 1), Item:=vVALs(v, 1) Next v 'only populate dNEW with items not found in dMASTER For v = LBound(vNEWs, 1) To UBound(vNEWs, 1) If Not dMASTER.exists(vNEWs(v, 1)) Then If Not dNEW.exists(vNEWs(v, 1)) Then _ dNEW.Add Key:=vNEWs(v, 1), Item:=vNEWs(v, 1) End If Next v Debug.Print dNEW.Count For Each k In dNEW.keys 'Debug.Print k Next k Debug.Print "End: " & Timer dNEW.RemoveAll: Set dNEW = Nothing dMASTER.RemoveAll: Set dMASTER = Nothing End Sub 

在C2:C89747的A2:A100000和89747随机条目中有99,999个唯一条目,发现在9.87秒内在A2:A100000中找不到70,087个独特的新条目。

我会这样做:

 Sub test() Dim newRow As Long, oldRow As Long Dim x As Long, Dim y As Long Dim checker As Boolean With ActiveSheet newRow = .Cells(.Rows.Count,7).End(xlUp).Row oldRow = .Cells(.Rows.Count,1).End(xlUp).Row checker = True for y = 1 To oldRow for x = 1 To newRow If .Cells(y,1).Value = .Cells(x,7).Value Then checker = False Exit For End If Next If checker Then Range("E10000").End(xlUp).Offset(1,0).Value = .Cells(y,1).Value End If checker = True Next End With End Sub 

VLookup是一个工作表函数 ,而不是一个常规的VBA函数,因此它用于Range s中search,而不是Collection s。

语法: VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])

[…]

table_array (必需): VLOOKUP将在其中searchlookup_value和返回值的单元格范围

为了在其他VBA数据结构(如数组,集合等)中进行search,您必须找出其他方法并可能手动实现它。

虽然@Jeped对Scripting.Dictionary对象的build议可能是最好的,但也可以尝试使用应用于数组的Filter()函数 。