Excel VBA性能 – 100万行 – 在1分钟内删除包含值的行

我试图find一种方法来过滤大量数据,并在不到一分钟的时间内删除工作表中的行

目标:

  • 在列1中查找包含特定文本的所有logging,并删除整行
  • 保持所有单元格格式(颜色,字体,边框,列宽)和公式原样

testing数据:

测试数据

代码如何工作:

  1. 它首先closures所有的Excelfunction
  2. 如果工作簿不为空,并且要删除的文本值存在于第1列中

    • 将第一列的使用范围复制到一个数组中
    • 迭代数组中的每个值
    • 当它find匹配时:

      • 将单元格地址附加到格式为"A11,A275,A3900,..."的tmpstring中
      • 如果tmp可变长度接近255个字符
      • 使用.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp删除行.Range("A11,A275,A3900,...").EntireRow.Delete Shift:=xlUp
      • 将tmp重置为空,并移至下一组行
  3. 最后,它将所有Excelfunction重新打开

主要问题是删除操作 ,总持续时间应该在一分钟以内。 任何基于代码的解决scheme只要在1分钟内执行,都是可以接受的。

这将范围缩小到很less可接受的答案。 已经提供的答案也非常短,易于实施。 一个人在大约30秒内完成操作,所以至less有一个答案提供了一个可接受的解决scheme,其他人也可能会觉得它有用

我的主要初始function:

 Sub DeleteRowsWithValuesStrings() Const MAX_SZ As Byte = 240 Dim i As Long, j As Long, t As Double, ws As Worksheet Dim memArr As Variant, max As Long, tmp As String Set ws = Worksheets(1) max = GetMaxCell(ws.UsedRange).Row FastWB True: t = Timer With ws If max > 1 Then If IndexOfValInRowOrCol("Test String", , ws.UsedRange) > 0 Then memArr = .Range(.Cells(1, 1), .Cells(max, 1)).Value2 For i = max To 1 Step -1 If memArr(i, 1) = "Test String" Then tmp = tmp & "A" & i & "," If Len(tmp) > MAX_SZ Then .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp tmp = vbNullString End If End If Next If Len(tmp) > 0 Then .Range(Left(tmp, Len(tmp) - 1)).EntireRow.Delete Shift:=xlUp End If .Calculate End If End If End With FastWB False: InputBox "Duration: ", "Duration", Timer - t End Sub 

帮助器function(打开和closuresExcelfunction):

 Public Sub FastWB(Optional ByVal opt As Boolean = True) With Application .Calculation = IIf(opt, xlCalculationManual, xlCalculationAutomatic) .DisplayAlerts = Not opt .DisplayStatusBar = Not opt .EnableAnimations = Not opt .EnableEvents = Not opt .ScreenUpdating = Not opt End With FastWS , opt End Sub Public Sub FastWS(Optional ByVal ws As Worksheet = Nothing, _ Optional ByVal opt As Boolean = True) If ws Is Nothing Then For Each ws In Application.ActiveWorkbook.Sheets EnableWS ws, opt Next Else EnableWS ws, opt End If End Sub Private Sub EnableWS(ByVal ws As Worksheet, ByVal opt As Boolean) With ws .DisplayPageBreaks = False .EnableCalculation = Not opt .EnableFormatConditionsCalculation = Not opt .EnablePivotTable = Not opt End With End Sub 

用数据查找最后一个单元格(谢谢@ZygD – 现在我在几个场景中testing了它):

 Public Function GetMaxCell(Optional ByRef rng As Range = Nothing) As Range 'Returns the last cell containing a value, or A1 if Worksheet is empty Const NONEMPTY As String = "*" Dim lRow As Range, lCol As Range If rng Is Nothing Then Set rng = Application.ActiveWorkbook.ActiveSheet.UsedRange If WorksheetFunction.CountA(rng) = 0 Then Set GetMaxCell = rng.Parent.Cells(1, 1) Else With rng Set lRow = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _ After:=.Cells(1, 1), _ SearchDirection:=xlPrevious, _ SearchOrder:=xlByRows) If Not lRow Is Nothing Then Set lCol = .Cells.Find(What:=NONEMPTY, LookIn:=xlFormulas, _ After:=.Cells(1, 1), _ SearchDirection:=xlPrevious, _ SearchOrder:=xlByColumns) Set GetMaxCell = .Parent.Cells(lRow.Row, lCol.Column) End If End With End If End Function 

返回数组中匹配的索引,如果没有find匹配,则返回0:

 Public Function IndexOfValInRowOrCol( _ ByVal searchVal As String, _ Optional ByRef ws As Worksheet = Nothing, _ Optional ByRef rng As Range = Nothing, _ Optional ByRef vertical As Boolean = True, _ Optional ByRef rowOrColNum As Long = 1 _ ) As Long 'Returns position in Row or Column, or 0 if no matches found Dim usedRng As Range, result As Variant, searchRow As Long, searchCol As Long result = CVErr(9999) '- generate custom error Set usedRng = GetUsedRng(ws, rng) If Not usedRng Is Nothing Then If rowOrColNum < 1 Then rowOrColNum = 1 With Application If vertical Then result = .Match(searchVal, rng.Columns(rowOrColNum), 0) Else result = .Match(searchVal, rng.Rows(rowOrColNum), 0) End If End With End If If IsError(result) Then IndexOfValInRowOrCol = 0 Else IndexOfValInRowOrCol = result End Function 

更新:

testing6个解决scheme(每个3个testing): Excel Hero的解决scheme是迄今为止最快的 (删除公式)

这是结果,最快到最慢:

testing1.共100,000条logging,10,000条被删除:

 1. ExcelHero() - 1.5 seconds 2. DeleteRowsWithValuesNewSheet() - 2.4 seconds 3. DeleteRowsWithValuesStrings() - 2.45 minutes 4. DeleteRowsWithValuesArray() - 2.45 minutes 5. QuickAndEasy() - 3.25 minutes 6. DeleteRowsWithValuesUnion() - Stopped after 5 minutes 

testing2.总计100万条logging,100,000条被删除:

 1. ExcelHero() - 16 seconds (average) 2. DeleteRowsWithValuesNewSheet() - 33 seconds (average) 3. DeleteRowsWithValuesStrings() - 4 hrs 38 min (16701.375 sec) 4. DeleteRowsWithValuesArray() - 4 hrs 37 min (16626.3051757813 sec) 5. QuickAndEasy() - 5 hrs 40 min (20434.2104492188 sec) 6. DeleteRowsWithValuesUnion() - N/A 

笔记:

  1. ExcelHero方法:易于实现,可靠,速度极快,但删除公式
  2. NewSheet方法:易于实现,可靠,符合目标
  3. string方法:实现更加努力,可靠,但不符合要求
  4. 数组方法:类似于string,但是ReDim是一个数组(Union的更快的版本)
  5. QuickAndEasy:易于实现(短,可靠,优雅),但不符合要求
  6. 范围联盟:实现复杂性类似于2和3,但速度太慢

通过引入不寻常的价值,我也使testing数据更加真实:

  • 空单元格,范围,行和列
  • 特殊字符,如= [`〜!@#$%^&*()_- + {} [] \ |;:'“,。<> /?
  • 空格,制表符,空公式,边框,字体和其他单元格格式
  • 大小数字小数(= 12.9999999999999 + 0.00000000000000001)
  • 超链接,条件格式规则
  • 数据范围内外的空格式
  • 其他任何可能导致数据问题的内容

我提供第一个答案作为参考

如果没有其他选项可用,其他人可能会觉得有用

  • 实现结果的最快方法不是使用删除操作
  • 在100万条logging中,平均每33秒删除100,000行

 Sub DeleteRowsWithValuesNewSheet() '100K records 10K to delete 'Test 1: 2.40234375 sec 'Test 2: 2.41796875 sec 'Test 3: 2.40234375 sec '1M records 100K to delete 'Test 1: 32.9140625 sec 'Test 2: 33.1484375 sec 'Test 3: 32.90625 sec Dim oldWs As Worksheet, newWs As Worksheet, rowHeights() As Long Dim wsName As String, t As Double, oldUsedRng As Range FastWB True: t = Timer Set oldWs = Worksheets(1) wsName = oldWs.Name Set oldUsedRng = oldWs.Range("A1", GetMaxCell(oldWs.UsedRange)) If oldUsedRng.Rows.Count > 1 Then 'If sheet is not empty Set newWs = Sheets.Add(After:=oldWs) 'Add new sheet With oldUsedRng .AutoFilter Field:=1, Criteria1:="<>Test String" .Copy 'Copy visible data End With With newWs.Cells .PasteSpecial xlPasteColumnWidths .PasteSpecial xlPasteAll 'Paste data on new sheet .Cells(1, 1).Select 'Deselect paste area .Cells(1, 1).Copy 'Clear Clipboard End With oldWs.Delete 'Delete old sheet newWs.Name = wsName End If FastWB False: InputBox "Duration: ", "Duration", Timer - t End Sub 

在高层次:

  • 它创build一个新的工作表,并保持对初始工作表的引用
  • AutoFilters列1search文本: .AutoFilter Field:=1, Criteria1:="<>Test String"
  • 复制初始工作表中的所有(可见)数据
  • 将列宽,格式和数据粘贴到新工作表
  • 删除初始表单
  • 将新工作表重命名为旧工作表名称

它使用问题中发布的相同帮助函数

AutoFilter使用99%的持续时间

到目前为止我发现了一些限制,第一个可以解决:

  1. 如果在初始工作表上有任何隐藏的行,它将取消隐藏它们

    • 需要单独的function来隐藏它们
    • 根据实施情况,可能会显着延长持续时间
  2. VBA相关:

    • 它改变表单的代码名称; 其他涉及Sheet1的VBA将被打破(如果有的话)
    • 它删除所有与初始工作表相关的VBA代码(如果有的话)

关于如何使用大文件的一些注意事项:

  • 二进制格式(.xlsb)显着减小文件大小(从137 Mb到43 Mb)
  • 非托pipe条件格式规则可能会导致指数性能问题

    • 同样的评论和数据validation
  • 从networking读取文件或数据比使用本地文件慢得多

如果源数据不包含公式,或者在条件行删除期间允许(或希望)将公式转换为硬值,则可以实现速度的显着提高。

以上作为一个警告,我的解决scheme使用范围对象的AdvancedFilter。 它的速度是DeleteRowsWithValuesNewSheet()的两倍。

 Public Sub ExcelHero() Dim t#, crit As Range, data As Range, ws As Worksheet Dim r&, fc As Range, lc As Range, fr1 As Range, fr2 As Range FastWB True t = Timer Set fc = ActiveSheet.UsedRange.Item(1) Set lc = GetMaxCell Set data = ActiveSheet.Range(fc, lc) Set ws = Sheets.Add With data Set fr1 = data.Worksheet.Range(fc, fc.Offset(, lc.Column)) Set fr2 = ws.Range(ws.Cells(fc.Row, fc.Column), ws.Cells(fc.Row, lc.Column)) With fr2 fr1.Copy .PasteSpecial xlPasteColumnWidths: .PasteSpecial xlPasteAll .Item(1).Select End With Set crit = .Resize(2, 1).Offset(, lc.Column + 1) crit = [{"Column 1";"<>Test String"}] .AdvancedFilter xlFilterCopy, crit, fr2 .Worksheet.Delete End With FastWB False r = ws.UsedRange.Rows.Count Debug.Print "Rows: " & r & ", Duration: " & Timer - t & " seconds" End Sub 

在我的老戴尔Inspiron 1564(Win 7 Office 2007)上:

 Sub QuickAndEasy() Dim rng As Range Set rng = Range("AA2:AA1000001") Range("AB1") = Now Application.ScreenUpdating = False With rng .Formula = "=If(A2=""Test String"",0/0,A2)" .Cells.SpecialCells(xlCellTypeFormulas, xlErrors).EntireRow.Delete .Clear End With Application.ScreenUpdating = True Range("AC1") = Now End Sub 

花了约10秒钟跑。 我假设列AA是可用的。

编辑#1:

请注意,此代码不会将“ 计算 ”设置为“手动”。 如果计算模式设置为“手动”,允许“帮助”列计算 ,性能将会提高。

我知道我在这里的答案令人难以置信,但是未来的访问者可能会发现它非常有用。

请注意:我的方法需要行的索引列以原始顺序结束,但是如果您不介意行的顺序不同,则不需要索引列,可以删除额外的代码行。

我的方法是:我的方法是简单地select所选范围(列)中的所有行,使用Range.Sort按升序sorting,然后在选定范围(列)内收集"Test String"的第一个和最后一个索引。 然后我从第一个和最后一个索引创build一个范围,并使用Range.EntrieRow.Delete删除所有包含"Test String"的行。

优点:
– 这是快速的。
– 它不会删除格式,公式,图表,图片或类似复制到新工作表的方法。

缺点:
– 一个体面的代码大小,但它是直截了当的。

testing范围生成子:

 Sub DevelopTest() Dim index As Long FastWB True ActiveSheet.UsedRange.Clear For index = 1 To 1000000 '1 million test ActiveSheet.Cells(index, 1).Value = index If (index Mod 10) = 0 Then ActiveSheet.Cells(index, 2).Value = "Test String" Else ActiveSheet.Cells(index, 2).Value = "Blah Blah Blah" End If Next index Application.StatusBar = "" FastWB False End Sub 

过滤和删除行子:

 Sub DeleteRowFast() Dim curWorksheet As Worksheet 'Current worksheet vairable Dim rangeSelection As Range 'Selected range Dim startBadVals As Long 'Start of the unwanted values Dim endBadVals As Long 'End of the unwanted values Dim strtTime As Double 'Timer variable Dim lastRow As Long 'Last Row variable Dim lastColumn As Long 'Last column variable Dim indexCell As Range 'Index range start Dim sortRange As Range 'The range which the sort is applied to Dim currRow As Range 'Current Row index for the for loop Dim cell As Range 'Current cell for use in the for loop On Error GoTo Err Set rangeSelection = Application.InputBox("Select the (N=) range to be checked", "Get Range", Type:=8) 'Get the desired range from the user Err.Clear M1 = MsgBox("This is recommended for large files (50,000 or more entries)", vbYesNo, "Enable Fast Workbook?") 'Prompt the user with an option to enable Fast Workbook, roughly 150% performace gains... Recommended for incredibly large files Select Case M1 Case vbYes FastWB True 'Enable fast workbook Case vbNo FastWB False 'Disable fast workbook End Select strtTime = Timer 'Begin the timer Set curWorksheet = ActiveSheet lastRow = CLng(rangeSelection.SpecialCells(xlCellTypeLastCell).Row) lastColumn = curWorksheet.Cells(1, 16384).End(xlToLeft).Column Set indexCell = curWorksheet.Cells(1, 1) On Error Resume Next If rangeSelection.Rows.Count > 1 Then 'Check if there is anything to do lastVisRow = rangeSelection.Rows.Count Set sortRange = curWorksheet.Range(indexCell, curWorksheet.Cells(curWorksheet.Rows(lastRow).Row, 16384).End(xlToLeft)) 'Set the sort range sortRange.Sort Key1:=rangeSelection.Cells(1, 1), Order1:=xlAscending, Header:=xlNo 'Sort by values, lowest to highest startBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, MatchCase:=False).Row endBadVals = rangeSelection.Find(What:="Test String", LookAt:=xlWhole, SearchDirection:=xlPrevious, MatchCase:=False).Row curWorksheet.Range(curWorksheet.Rows(startBadVals), curWorksheet.Rows(endBadVals)).EntireRow.Delete 'Delete uneeded rows, deleteing in continuous range blocks is quick than seperated or individual deletions. sortRange.Sort Key1:=indexCell, Order1:=xlAscending, Header:=xlNo 'Sort by index instead of values, lowest to highest End If Application.StatusBar = "" 'Reset the status bar FastWB False 'Disable fast workbook MsgBox CStr(Round(Timer - strtTime, 2)) & "s" 'Display duration of task Err: Exit Sub End Sub 

此代码使用FastWBFastWSEnableWS FastWS :Paul Bica!

在100K条目的时间(10K被删除,FastWB真):
1. 0.2秒。
2. 0.2秒。
3. 0.21秒。
平均。 0.2秒。

在100万条目的时间(100k被删除,FastWB True):
1. 2.3秒。
2. 2.32秒。
2.3秒。
平均。 2.31秒。

运行于:Windows 10,iMac i3 11,2(从2010年起)

编辑
此代码最初devise的目的是过滤数字范围之外的数字值,并且已经适用于过滤"Test String"因此某些代码可能是多余的。

在计算使用的范围和行数时使用数组可能会影响性能。 这里有另外一种方法,在testing中certificate,在整个1m +行的数据上是有效的 – 在25-30秒之间。 它不使用filter,所以即使隐藏也会删除行。 删除整行将不会影响其他剩余行的格式或列宽。

  1. 首先,检查ActiveSheet是否有“testingstring”。 既然你只对第一列感兴趣,我用这个:

     TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String") If TCount > 0 Then 
  2. 我只使用Cells.SpecialCells(xlCellTypeLastCell).Row来获取最后一行,而不是使用GetMaxCell()函数:

     EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row 
  3. 然后遍历数据行:

     While r <= EndRow 
  4. 要testing列1中的单元格是否等于“testingstring”:

     If sht.Cells(r, 1).Text) = "Test String" Then 
  5. 删除行:

     Rows(r).Delete Shift:=xlUp 

把它放在下面的全部代码。 我已经将ActiveSheet设置为一个variablesSht,并添加了ScreenUpdating转向以提高效率。 由于这是大量的数据,所以最后我要确保清除variables。

 Sub RowDeleter() Dim sht As Worksheet Dim r As Long Dim EndRow As Long Dim TCount As Long Dim s As Date Dim e As Date Application.ScreenUpdating = True r = 2 'Initialise row number s = Now 'Start Time Set sht = ActiveSheet EndRow = sht.Cells.SpecialCells(xlCellTypeLastCell).Row 'Check if "Test String" is found in Column 1 TCount = Application.WorksheetFunction.CountIf(sht.Columns(1), "Test String") If TCount > 0 Then 'loop through to the End row While r <= EndRow If InStr(sht.Cells(r, 1).Text, "Test String") > 0 Then sht.Rows(r).Delete Shift:=xlUp r = r - 1 End If r = r + 1 Wend End If e = Now 'End Time D = (Hour(e) * 360 + Minute(e) * 60 + Second(e)) - (Hour(s) * 360 + Minute(s) * 60 + Second(s)) Application.ScreenUpdating = True DurationTime = TimeSerial(0, 0, D) MsgBox Format(DurationTime, "hh:mm:ss") End Sub