Excel高亮重复和按颜色替代过滤

我的电子表格有大约80万行，30列。客户仅对一列中的重复值感兴趣。他们需要整个行。例如

MemberId|Name|Address|CircleScore H111|John Doe|123 W Main|2.4 H222|Jane Doe|124 W Main|3.2 H333|Bob Doe|125 W Main|2.5 H444|Jake Doe|126 W Main|2.1 H555|Mike Doe|127 W Main|2.4

他们希望CircleScore中有重复的整行。所以我过滤的Excel应该只包含：

 MemberId|Name|Address|CircleScore H111|John Doe|123 W Main|2.4 H555|Mike Doe|127 W Main|2.4

我试着突出重复CircleScore和过滤，但过滤部分永远。我已经等了15分钟，但还是没有运气。重复数可能在150K左右。

有其他select吗？

我将创build一个Is_Duplicated指标列，并使用它来过滤重复的CircleScores ：

Excel图片

更新（每个评论）：

或者，您可以sort CircleScore列进行sort ，并使公式对您的系统CircleScore （ 注意： CircleScore必须事先sorting ）：

Excel选择

^{如果你是a）按小时付款而感到欠薪，b）计划在常规过程中小睡，或者c）a）和b），请忽略此提交。}

随着任何数据集接近800K行（30列），你将要进入变种数组领域。对于处理工作表值所需的时间通常为5-7％，这对于大型数据块来说是非常合适的。

任何时候，“重复”这个词都会起作用，我立即开始思考如何在其Keys上使用Scripting.Dictionary对象的唯一索引。在这个解决scheme中，我使用了一对字典来确定具有重复的Circle Score值的数据行。

二千四百万个数据单元是很多存储和传输的。批量方法每次击败单个方法，剥离数据的最大方法是将所有800K行×30列填充到变体数组中。所有的处理都在内存中，并且结果集中返回到报表工作表。

isolateDuplicateCircleScores代码

 Sub isolateDuplicateCircleScores() Dim d As Long, v As Long, csc As Long, stmp As String Dim ky As Variant, itm As Variant, vVALs As Variant, dCSs As Object, dDUPs As Object Dim w As Long, vWSs As Variant 'early binding 'dim dCSs As new scripting.dictionary, dDUPs As new scripting.dictionary appTGGL bTGGL:=False 'late binding - not necessary with Early Binding (see footnote ¹) Set dCSs = CreateObject("Scripting.Dictionary") Set dDUPs = CreateObject("Scripting.Dictionary") 'set to the defaults (not necessary) dCSs.comparemode = vbBinaryCompare dDUPs.comparemode = vbBinaryCompare 'for testing on multiple row number scenarios 'vWSs = Array("CircleScores_8K", "CircleScores_80K", "CircleScores_800K") 'for runtime vWSs = Array("CircleScores") '<~~ your source worksheet here For w = LBound(vWSs) To UBound(vWSs) 'ThisWorkbook.Save Debug.Print vWSs(w) Debug.Print Timer With Worksheets(vWSs(w)) On Error Resume Next Worksheets(vWSs(w) & "_dupes").Delete On Error GoTo 0 ReDim vVALs(0) dCSs.RemoveAll dDUPs.RemoveAll 'prep a new worksheet to receive the duplicates .Cells(1, 1).CurrentRegion.Resize(2).Copy With Worksheets.Add(after:=Worksheets(.Index)) .Name = vWSs(w) & "_dupes" With .Cells(1, 1) .PasteSpecial Paste:=xlPasteAllUsingSourceTheme, Operation:=xlNone .PasteSpecial Paste:=xlPasteColumnWidths, Operation:=xlNone .Value = .Value2 .Offset(1, 0).EntireRow.ClearContents End With End With 'finish prep with freeze row 1 and zoom to 80% With Application.Windows(1) .SplitColumn = 0 .SplitRow = 1 .FreezePanes = True .Zoom = 80 End With 'grab all of the data into a variant array ReDim vVALs(0) csc = Application.Match("CircleScore", .Rows(1), 0) 'CircleScore column number needed later vVALs = .Range(.Cells(2, 1), _ .Cells(.Cells(Rows.Count, csc).End(xlUp).Row, _ .Cells(1, Columns.Count).End(xlToLeft).Column)).Value2 'Debug.Print LBound(vVALs, 1) & ":" & UBound(vVALs, 1) '1:~800K 'Debug.Print LBound(vVALs, 2) & ":" & UBound(vVALs, 2) '1:~30 End With 'done with the original worksheet 'populate the dDUPs dictionary using the key index in dCSs For v = LBound(vVALs, 1) To UBound(vVALs, 1) If dCSs.exists(vVALs(v, csc)) Then stmp = vVALs(v, 1) For d = LBound(vVALs, 2) + 1 To UBound(vVALs, 2) stmp = Join(Array(stmp, vVALs(v, d)), ChrW(8203)) Next d dDUPs.Add Key:=v, Item:=stmp If Not dDUPs.exists(dCSs.Item(vVALs(v, csc))) Then stmp = vVALs(dCSs.Item(vVALs(v, csc)), 1) For d = LBound(vVALs, 2) + 1 To UBound(vVALs, 2) stmp = Join(Array(stmp, vVALs(dCSs.Item(vVALs(v, csc)), d)), ChrW(8203)) Next d dDUPs.Add Key:=dCSs.Item(vVALs(v, csc)), Item:=stmp End If Else dCSs.Item(vVALs(v, csc)) = v End If Next v 'split the dDUPs dictionary items back into a variant array d = 1 ReDim vVALs(1 To dDUPs.Count, 1 To UBound(vVALs, 2)) For Each ky In dDUPs.keys itm = Split(dDUPs.Item(ky), ChrW(8203)) For v = LBound(itm) To UBound(itm) vVALs(d, v + 1) = itm(v) Next v d = d + 1 Next ky 'put the values into the duplicates worksheet With Worksheets(vWSs(w) & "_dupes") .Cells(2, 1).Resize(UBound(vVALs, 1), UBound(vVALs, 2)) = vVALs With .Cells(1, 1).CurrentRegion With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0) .Rows(1).Copy .PasteSpecial Paste:=xlPasteFormats, Operation:=xlNone End With .Cells.Sort Key1:=.Columns(csc), Order1:=xlAscending, _ Key2:=.Columns(1), Order2:=xlAscending, _ Orientation:=xlTopToBottom, Header:=xlYes End With End With Debug.Print Timer Next w dCSs.RemoveAll: Set dCSs = Nothing dDUPs.RemoveAll: Set dDUPs = Nothing appTGGL End Sub Public Sub appTGGL(Optional bTGGL As Boolean = True) With Application .ScreenUpdating = bTGGL .EnableEvents = bTGGL .DisplayAlerts = bTGGL .AutoRecover.Enabled = bTGGL 'no interruptions with an auto-save .Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual) .CutCopyMode = False .StatusBar = vbNullString End With Debug.Print Timer End Sub

样本数据和结果

duplicateIdentification_800K
^{800K行×30列的随机样本数据}

duplicateIdentification_800K_results
^{〜123K行×30列的重复行（sorting和格式化大约一分半）}

计时结果

tbh，我从来没有得到老的笔记本电脑上的Excel的32位版本运行800K通过运行不止一次而无需重新启动Excel。一旦重新启动，结果与显示的一致。 64位的Excel不用打嗝就反复运行。

duplicateIdentification_results

^{大型工作表附录}

^{处理包含大型数据块的工作表时，可以通过一些常规改进来限制等待时间。} ^{您将Excel作为中型数据库工具使用，因此将数据工作表视为应该是的原始数据。}

^{如果您不使用64位版本的Excel，那么您正在浪费时间处理所有的事情。} ^{请参阅我正在使用哪个版本的Office？} ^{并select32位或64位版本的Office 。}
^{另存为Excel二进制工作簿（例如.XLSB）。} ^{文件大小通常是原来的25-35％。} ^{加载时间得到了改善，一些计算速度更快（对不起，后者没有经验性的时间数据）。} ^{崩溃.XLSX或.XLSM的某些操作可以正常使用.XLSB。}
^{在工作簿的选项中禁用自动保存/自动恢复。} ^{（[alt] + F，T，S，[alt] + D，[OK]）。} ^{有很多事情比等待自动保存完成，当你正在尝试做什么更恼人。} ^{当你想保存时习惯Ctrl + S。}
^{避免挥发性function¹不惜一切代价;} ^{特别是在整个数据范围内使用的公式中。} ^{COUNTIF公式中的单个TODAY()在行的范围内填充将使您经常坐在拇指上。}
^{说到公式，只要有可能，就把所有公式都还原成结果值。}
^{合并的单元格，条件格式，数据validation和使单元格看起来漂亮的格式和样式会减慢你的速度。} ^{最大限度地减less从原始数据中删除的任何东西的使用。} ^{不像任何人实际上要查看800K行数据。}
^{删除数据后首页►编辑►清空►全部清空空白单元格。} ^{点击Del只会清除内容，可能不会重置Worksheet.UsedRange属性 ;} ^{全部清除将有助于在下次保存时重置使用范围。}
^{如果您的计算机有一个或多个Excel [不响应]情况，请重新启动计算机。} ^{Excel从来没有完全恢复从这些，只需重新启动Excel重新开始更慢，更可能稍后进入相同的非响应条件。}

¹ _{如果可以将Scripting.Dictionary的后期绑定转换为早期绑定，则必须将Microsoft脚本运行时添加到VBE的工具►引用。}

² _{只要整个工作手册中的任何内容发生变化，挥发性函数就会重新计算，而不仅仅是当影响其结果的事情发生变化时。} _{易失性函数的例子是间接，偏移，今天，现在，兰德和兰德维恩。} _{CELL和INFO工作表函数的一些子函数也会使它们变得不稳定。}

截图1

试试这个VBA代码（并学习一点荷兰语）

 Sub DuplicatesInColumn() 'maakt een lijst met de aangetroffen dubbelingen Dim LaatsteRij As Long Dim MatchNr As Long Dim iRij, iKolom, iTeller, Teller As Long, ControlKolom As Long iRij = 1 iKolom = 5 'number of columns in the sheet, Chance if not correct ControlKolom = 4 'column number where to find the doubles, Chance if not correct LaatsteRij = Cells(65000, iKolom).End(xlUp).Row: iTeller = iKolom Sheet1.Activate For iRij = 1 To LaatsteRij If Cells(iRij, ControlKolom) <> "" Then MatchNr = WorksheetFunction.Match(Cells(iRij, ControlKolom), Range(Cells(1, ControlKolom), Cells(LaatsteRij, ControlKolom)), 0) If iRij <> MatchNr Then iTeller = iKolom For Teller = 1 To iTeller Cells(iRij, iKolom + Teller).Offset(0, 2).Value = Range(Cells(iRij, Teller), Cells(iRij, Teller)).Value Next Teller End If: End If Next End Sub

Excel高亮重复和按颜色替代过滤

如果单元格1包含X，那么单元格2应该等于W，如果单元格1包含Z，则查看下一个标准

Excel Web Query强制XML

Excel if语句与string比较

使用非英文语言获取Google GeoCode坐标

在filter处于活动状态时，Excel表格列中的唯一值的数量

下拉dynamic命名范围：在不使用VBA的情况下删除重复项

SUMPRODUCT与标准的date

如何使一个单元格等于另一个单元格中的expression式的值（Excel 2010）？

Excel – 需要创build一个下拉列表（数据validation）是指两列或更多列

删除excel中的重复项