VBA删除重复代码更快

目前使用这个代码,但是,我有一个庞大的数据集,这真的很慢。 我需要删除任何重复的信息,并保持最高的一行信息。

dim dup as variant, r as long, lncheckduplicatescolumn as long With wb_DST.Sheets(sWs_DST) lncheckduplicatescolumn = .Cells(.Rows.Count, "A").End(xlUp).row for r = lncheckduplicatescolumn to 2 step -1 dup = application.match(.cells(r, "A").value, .columns(1), 0) if dup < r then .rows(dup).delete next r end with 

数据:

  Column A Column B A 1 B 2 C 3 A 3 

结果应该是:

  B 2 C 3 A 3 

A列中的数据顺序只要是唯一的,就保持不变,并保留较高行号的信息。 虽然我共享的代码有效,但对于大型数据集来说太慢了。

另一个快速的方法,就是使用Dictionary对象。 您可以检查Dictionary A中是否存在任何值。 如果他们这样做(意味着它是重复的),那么不要每次都删除它们,这会增加代码的运行时间。 相反,您可以使用DelRng对象,该对象是使用Union来合并多个重复行的Range

稍后,可以使用DelRng.Delete一次删除整个ducplicates范围。

 Option Explicit Sub RemoveDuplicatesUsingDict() Dim wb_DST As Workbook Dim sWs_DST As String ' Dictionary variables Dim Dict As Object Dim DictIndex As Long, ExistIndex As Long Dim DelRng As Range Dim LastRow As Long, i As Long ' --- parameters for my internal testing --- Set wb_DST = ThisWorkbook sWs_DST = "Sheet1" Application.ScreenUpdating = False Set Dict = CreateObject("Scripting.Dictionary") With wb_DST.Sheets(sWs_DST) LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row ' get last row with data in column "A" For i = LastRow To 2 Step -1 If Not Dict.exists(.Range("A" & i).Value) Then ' value doesn't exists yet in Dictionary >> add this Key Dict.Add .Range("A" & i).Value, .Range("A" & i).Value Else ' value already exists in Dictionary >> add it to DelRng (later will delete the entire range) If Not DelRng Is Nothing Then Set DelRng = Application.Union(DelRng, .Rows(i)) ' add current row to existing DelRng Else Set DelRng = .Rows(i) End If End If Next i End With ' delete the entire range at 1-shot If Not DelRng Is Nothing Then DelRng.Delete Application.ScreenUpdating = True End Sub 

快速使用数据字段数组

循环范围并不是那么快 – 如果使用search数据创build数据字段数组(在列“A”中请参见1),并在其中进行循环, 则可以大大加快速度 。 如果数据集增长,与上面所示的字典方法相比,这个速度会更快,尽pipe它是一个很好的可靠的方法。

search方法

  • 任何数组值都是针对已经find唯一值的串联searchstring进行检查的,如果还没有包含,则添加 – 参见2)
  • 完成的string被转换为数组并写回到给定的目标列(例如“H”) – 见3)和4)

我甚至还添加了相应行号的第二列,所以你应该能够使用它们来做进一步的行动。 您也可以将结果写入其他工作表。

代码 – 方法演示

 Sub RemoveDuplicates() Dim t As Double: t = Timer ' stop watch Dim ws As Worksheet ' source sheet (object) Dim i As Long ' row index Dim a, arr, arr2 ' variant Dim s As String, si As String Const SEP = "|" ' delimiter s = SEP: si = SEP ' 0) fully qualified range reference to source sheet Set ws = ThisWorkbook.Worksheets("Sheet1") ' 1) write column A data to one based 2-dim data field array a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row) ' 2) loop through data and check for duplicates in concatenated string s For i = 2 To UBound(a) If InStr(s, SEP & a(i, 1) & SEP) = 0 Then If Len(a(i, 1)) > 0 Then s = s & a(i, 1) & SEP si = si & i & SEP End If End If Next i ' 3) transform unique values to zero based 1-dim array arr = Split(Mid(s, 2), SEP) ' duplicates string to array arr2 = Split(Mid(si, 2), SEP) ' found row numbers ' 4) write result to column H2:H... ' <<< change target to wanted column ws.Range("H:H").ClearContents ' ws.Range("H2:H" & (2 + UBound(arr))).Value = Application.Transpose(arr) ws.Range("I2:I" & (2 + UBound(arr2))).Value = Application.Transpose(arr2) Debug.Print UBound(arr) + 0 & " unique items found", Format(Timer - t, "0.00 seconds needed") End Sub 

================================================== ===============

编辑

版本2 – 包括用唯一值覆盖原始数据

在这里你可以find一个稍微修改过的版本,用35个列(A2:AI ..)来覆盖原始数据,并使用唯一的值。

 Sub RemoveDuplicates2() ' Edit: overwrite original data A2:AI{..} with unique values Dim t As Double: t = Timer ' stop watch Dim ws As Worksheet ' source sheet (object) Dim i As Long ' row index Dim a, arr, arr2 ' variant Dim s As String, si As String Const SEP = "|" ' delimiter Const MyLastColumn = "AI" ' letter of last column (no 35) = "AI" s = SEP: si = SEP ' fully qualified range reference to source sheet Set ws = ThisWorkbook.Worksheets("Sheet1") ' write column A data to one based 2-dim data field array a = ws.Range("A1:A" & ws.Cells(ws.Rows.Count, "A").End(xlUp).Row) ' loop through data and check for duplicates in concatenated string s For i = 2 To UBound(a) ' For i = UBound(a) To 2 Step -1 If InStr(s, SEP & a(i, 1) & SEP) = 0 Then If Len(Trim(a(i, 1))) > 0 Then s = s & a(i, 1) & SEP si = si & i & SEP End If End If Next i ' write unique values to zero based 1-dim array (starts with index 0; last delimiter removed in this version) arr2 = Split(Mid(si, 2, Len(si) - 2), SEP) ' found row numbers ' overwrite original data For i = LBound(arr2) To UBound(arr2) ' starts with index 0! s = "A" & arr2(i) & ":" & MyLastColumn & arr2(i) arr = ws.Range(s) ' create 1-based 1-line (2-dim) array s = "A" & i + 2 & ":" & MyLastColumn & i + 2 ' 0 + 2 = +2 ... start in row 2 ws.Range(s) = arr ' write back unique row values Next i s = "A" & UBound(arr2) + 3 & ":" & MyLastColumn & UBound(a) + 1 ws.Range(s).ClearContents ' clear rest of original data Debug.Print UBound(arr2) + 1 & " unique items found", Format(Timer - t, "0.00 seconds needed") ' result End Sub