使用2列删除重复

我正试图删除工作表中的重复的ID。 例如这里有几行数据

ID | Department | Sales | Update Date 1 | Sales | 100 | 2 | Marketing | 100 | 2 | Marketing | 200 | 30/06/2015 2 | Marketing | 300 | 05/07/2015 

我想删除重复的ID,但基于更新date列。 所以我想只剩下下面的东西:

 ID | Department | Sales | Update Date 1 | Sales | 100 | 2 | Marketing | 300 | 05/07/2015 

所以它会检查该ID的最新更新行并删除其他的。

任何关于使用VBA或macros来做这件事的build议都会很好,因为它会构成一个自动脚本的一部分。

实现你想要做的一种方法是读取所有的行并遍历每一个重复的行,find要保留的内容,根据查找最高的update_date来删除什么。

我已经成功地写了一个macros来做到这一点。 这是我的代码:

首先 :在VBA编辑器中创build一个空白模块并粘贴以下代码:

 Public Type Row id As String updated As Date row_number As Integer 'to know which rows to delete later is_duplicate As Boolean 'to mark if current row is duplicate to_keep As Boolean 'to decide whether to keep or to delete verified As Boolean 'needed to avoid evaluating all rows with the same ID End Type Sub RemoveDuplicates() Range("a2").Select 'go to first row Dim cnt As Integer 'keep record of how many rows cnt = 0 'begin with an empty array Dim rows() As Row 'declared without the count '== step 1: read all data and store in array =============== Do While ActiveCell.Value <> "" cnt = cnt + 1 ReDim Preserve rows(cnt) 'expand the size of the array by ONE rows(cnt - 1).row_number = ActiveCell.Row 'keep record of current row address If ActiveCell.Offset(1, 0).Value = ActiveCell.Value Or _ ActiveCell.Offset(-1, 0).Value = ActiveCell.Value Then 'if the cell above/below has the samve ID as the current cell, then it's duplicates rows(cnt - 1).is_duplicate = True Else rows(cnt - 1).is_duplicate = False End If rows(cnt - 1).id = ActiveCell.Value 'store the id rows(cnt - 1).updated = ActiveCell.Offset(0, 3).Value 'store the date ActiveCell.Offset(1, 0).Select 'move to the next row below Loop '=== step 2: iterating throw the array and deciding what to keep, what to delete ========= For i = 0 To cnt - 1 If rows(i).is_duplicate And Not rows(i).verified Then 'the current ID is duplicated, and all of the other rows with the same ID has not been verified find_to_keep rows, rows(i).id, cnt 'helper Sub to analyze each row End If Next '==== step 3: iterating throw the array to delete ones marked to delete ========== For i = cnt - 1 To 0 Step -1 'we have to reverse the order because deleted rows will contain data from other valid rows If rows(i).is_duplicate And Not rows(i).to_keep Then 'if the current row is duplicate and is not marked (to keep) then it must be deleted Dim r As Integer r = rows(i).row_number 'get the rows number (address) of the row Range(r & ":" & r).EntireRow.Delete shift:=xlShiftUp 'delete the row and shift the other rows below UP End If Next End Sub Sub find_to_keep(ByRef rows() As Row, ByVal id As String, ByVal cnt As Integer) Dim max_date As Date 'temparary variable to hold the maximum date Dim to_keep As Integer 'temporary variable to hold the location of row to keep ' -- step a: go throw the array and find all rows with id specified in the sub parameter For i = 0 To cnt - 1 If rows(i).id = id Then 'if that row has a date that is higher than our current max_date, the read its date If rows(i).updated > max_date Then max_date = rows(i).updated to_keep = i End If End If Next '-- step b: now that we know what row to keep, we need to do: ' 1- mark all other rows having the same ID as verified (to avoid looping through them again) ' 2- mark the row with the highest date to (to_keep) = true For i = 0 To cnt - 1 If rows(i).id = id Then If i = to_keep Then rows(i).to_keep = True Else rows(i).to_keep = False End If rows(i).verified = True End If Next End Sub 

这是它的样子: excel结果的截图

如果你喜欢,我已经附上整个工作手册供你参考: remove_Duplicates.xlsm