为大型数据集优化VLOOKUP

我写了一个代码来比较两个工作表WS1和Ws2。 代码从ws1读取每行的主键,并在ws2中find具有相同主键的对应行,然后在两个工作表之间匹配所有其他列属性并进行相应报告。

代码是:

Sub DataComparator(ws1 As Worksheet, ws2 As Worksheet) Dim ws1row As Long, ws2row As Long, ws1col As Integer, ws2col As Integer Dim maxrow As Long, maxcol As Integer, colval1 As String, colval2 As String Dim difference As Long, reportrow As Long, reportcol As Long, flag As Boolean Dim row As Long, col As Long, pki As Long, pk As String, counter As Long Dim PctDone As Single, cell1 As String, cell2 As String, bfailed As Boolean TestDataComparator.FrameProgress.Visible = True TestDataComparator.LabelProgress.Visible = True 'UserForm1.Visible = True 'Application.ScreenUpdating = False DoEvents With ws1.UsedRange ws1row = .Rows.Count ws1col = .Columns.Count End With With ws2.UsedRange ws2row = .Rows.Count ws2col = .Columns.Count End With maxrow = ws1row maxcol = ws1col pk = UCase(TestDataComparator.TextBox1.Value) For col = 1 To maxcol If pk = UCase(ws1.Cells(1, col).Formula) Then pki = col End If Next col If maxrow < ws2row Then maxrow = ws2row If maxcol < ws2col Then maxcol = ws2col difference = 0 reportrow = 0 For row = 2 To maxrow keyval = ws1.Cells(row, 1).Formula flag = False bfailed = False 'reportcol = 1 For col = 2 To maxcol 'If col = pki Then 'Exit For 'End If counter = counter + 1 cell1 = "" cell2 = "" cell1 = ws1.Cells(row, col).Formula On Error Resume Next 'Set Rng = Range("A2:" & Cells(ws2row, "A").Address) cell2 = Application.WorksheetFunction.VLookup(keyval, ws2.UsedRange, col, False) If Err.Number <> 0 Then bfailed = True On Error GoTo 0 If bfailed = True Then Exit For End If If cell1 <> cell2 Then flag = True 'difference = difference + 1 diffcolname = ws1.Cells(1, col) ws1.Cells(row, col).Interior.Color = RGB(255, 255, 0) ws1.Cells(1, col).Interior.Color = RGB(255, 0, 0) ws1.Cells(row, col).Font.Bold = True ws1.Cells(1, pki).Interior.Color = RGB(0, 255, 0) ws1.Cells(row, pki).Interior.Color = RGB(255, 255, 0) ws1.Cells(row, pki).Font.Color = RGB(255, 0, 0) ws1.Cells(row, pki).Font.Bold = True End If Next col If flag = True Then reportrow = reportrow + 1 End If PctDone = counter / (maxrow * maxcol) TestDataComparator.FrameProgress.Caption = "Progress..." & Format(PctDone, "0%") TestDataComparator.LabelProgress.Width = PctDone * (TestDataComparator.FrameProgress.Width - 10) DoEvents Next row TestDataComparator.Totalcount.Value = row - 2 TestDataComparator.mismatchCount.Value = reportrow TestDataComparator.mismatchCount.Font = Bold difference = 0 For col = 1 To maxcol If ws1.Cells(1, col).Interior.Color = RGB(255, 0, 0) Then difference = difference + 1 TestDataComparator.AttributeNameList.AddItem (ws1.Cells(1, col)) End If Next col TestDataComparator.FrameProgress.Visible = False TestDataComparator.LabelProgress.Visible = False 'TestDataComparator.PleaseWait.Visible = False MsgBox difference & " columns contain different data! ", vbInformation, "Comparing Two Worksheets" Application.ScreenUpdating = True End Sub 

我希望vlookup函数仅在具有主键(index pki)而不是ws2.UsedRange的WS2的整个列中search匹配项。 请提供build议。 有没有比vlookup更好的选项? ws2.UsedRange的使用使得难以在大数据集中进行search,这就是为什么我想减lesssearch空间的原因。 我的数据集有超过40K行和155列的excel。 如果你觉得不合适,还build议我计算进度条的进度。

来自OP评论的样本数据:

 Name Height Weight Jane 5'6'' 78 Mike 5'4'' 89 Monica 5'2'' 56 

我认为使用字典(也叫其他语言的哈希表)可以使其更快。 您将需要引用Microsoft脚本运行时库。

在开始逐行浏览ws1之前,需要在一个循环中将ws2键值与它们的行号读入Dictionary中。 然后在循环中查找字典中的值来获取ws2上的行号。 像这样的东西:

 Dim ws2keys As Dictionary Set ws2keys = New Dictionary ' assuming you have a header row For row = 2 To ws2.UsedRange.Rows.Count keyValue = ws1.Cells(row, 1).Value If keyValue <> "" Then ws2keys.Add(keyValue, row) Next ' your dictionary is ready 

然后在你的循环中,而不是在ws1上逐行进行时使用VLookup:

 ws2RowIndex = ws2keys.Item(ws1KeyValueYouAreLookingFor) 

(代码可能不完美,我没有任何微软在这台机器上检查语法,对不起。)

我已经将每个列的VLOOKUP减less到一个MATCH来validation它是否存在以及一个MATCH来设置发生匹配的WS2行。 其他一切都是通过直接寻址完成​​的。

 Sub DataComparator(ws1 As Worksheet, ws2 As Worksheet) Dim ws1row As Long, ws2row As Long, ws1col As Long, ws2col As Long Dim maxrow As Long, maxcol As Long, colval1 As String, colval2 As String Dim difference As Long, reportrow As Long, reportcol As Long, flag As Boolean Dim rw As Long, cl As Long, pki As Long, pk As String, counter As Long Dim cell1 As String, cell2 As String, bfailed As Boolean Dim iPCT As Long, ws2rw As Long, rWS1cr As Range, rWS2cr As Range, keyval As Variant, app As Application Set app = Application 'UserForm1.Visible = True app.ScreenUpdating = False 'DoEvents With ws1.Cells(1, 1).CurrentRegion Set rWS1cr = .Cells ws1row = .Rows.Count ws1col = .Columns.Count End With With ws2.Cells(1, 1).CurrentRegion Set rWS2cr = .Cells ws2row = .Rows.Count ws2col = .Columns.Count End With maxrow = ws1row maxcol = ws1col 'pk = UCase(TestDataComparator.TextBox1.Value) For cl = 1 To maxcol If pk = UCase(rWS1cr.Cells(1, cl).Value) Then pki = cl Exit For End If Next cl If maxrow < ws2row Then maxrow = ws2row If maxcol < ws2col Then maxcol = ws2col difference = 0 reportrow = 0 With rWS1cr For rw = 2 To maxrow keyval = ws1.Cells(rw, 1).Value If Not IsError(app.Match(keyval, rWS2cr.Columns(1), 0)) Then ws2rw = app.Match(keyval, rWS2cr.Columns(1), 0) flag = False For cl = 2 To maxcol counter = counter + 1 cell1 = vbNullString cell2 = vbNullString cell1 = .Cells(rw, cl).Value cell2 = rWS2cr.Cells(ws2rw, cl).Value If cell1 <> cell2 Then flag = True 'diffcolname = .Cells(1, cl) .Cells(rw, cl).Interior.Color = RGB(255, 255, 0) .Cells(1, cl).Interior.Color = RGB(255, 0, 0) .Cells(rw, cl).Font.Bold = True .Cells(1, pki).Interior.Color = RGB(0, 255, 0) .Cells(rw, pki).Interior.Color = RGB(255, 255, 0) .Cells(rw, pki).Font.Color = RGB(255, 0, 0) .Cells(rw, pki).Font.Bold = True End If Next cl reportrow = reportrow - CLng(flag) If iPCT <> CLng((rw / maxrow) * 100) Then iPCT = CLng((rw / maxrow) * 100) app.StatusBar = "Progress - " & Format(iPCT, "0\%") End If End If Next rw For cl = 1 To maxcol If .Cells(1, cl).Interior.Color = RGB(255, 0, 0) Then difference = difference + 1 'TestDataComparator.AttributeNameList.AddItem (ws1.Cells(1, col)) End If Next cl MsgBox difference & " columns contain different data! ", vbInformation, "Comparing Two Worksheets" End With difference = 0 app.ScreenUpdating = True app.StatusBar = vbNullString Set app = Nothing End Sub 

我更喜欢.CurrentRegion.UsedRange因为我觉得它更可靠。 这段代码没有经过testing,但它编译,我不得不注释掉一些外部引用来实现这一点。