在Excel中检查重复的子string

我试图find一种方法来比较每个单元格中的第一个重要单词与下一个单元格中的第一个重要单词,如果第一个重要单词匹配,则删除第二个条目。 例如,起始数据可能如下所示:

  • 通用电气

    通用电气公司

    通用电气公司

    微软

    微软公司

    微软服务器

    任天堂

    任天堂企业

结果应该是这样的:

  • 通用电气

    微软

    任天堂

到目前为止,我已经设置了遍历数​​据列的代码:

Sub CompanyNameConsolidate() Dim companyName As String Dim companyArray() As String Dim companyName2 As String Dim companyArray2() As String Dim totalArray() As String Dim wordCount As Integer Dim i As Integer Dim r As Range With Sheets("Unassigned") Range("B1").Select Do Until IsEmpty(ActiveCell) companyName = Range("B" & ActiveCell.Row).Text companyName2 = ActiveCell.Offset(1, 0).Text companyArray = Split(companyName, " ") companyArray2 = Split(companyName2, " ") wordCount = UBound(companyArray) - LBound(companyArray) For i = 0 To wordCount If companyArray(i) = companyArray2(i) Then [*********HELP**********] Next ActiveCell.Offset(1, 0).Select Loop End With End Sub 

基本上,上面的代码将每个单元格中的子string与下一个单元格中的子string进行比较。 不幸的是,就我所知。

棘手的是,一些公司名称可以有两个字(通用电气),而其他人可以只有一个字(微软)。

你可以假设列表将按照字母顺序sorting,所以最短的名字(我想保留的名字)总是在最前面。

我有超过16,000个条目需要通过和修复,所以我绝对必须有一个自动化的方法来做到这一点!

使用行删除的第一个变体:

 Sub test() Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary") Dim x&, cl As Range, DataRange As Range, k1, k2 Dic.comparemode = vbTextCompare With Sheets("Unassigned") Set DataRange = .[B1].Resize(.Cells(Rows.Count, "B").End(xlUp).Row, 2) x = 1 For Each cl In DataRange If cl.Value <> "" Then Dic.Add x, cl.Value x = x + 1 End If Next cl For Each k1 In Dic For Each k2 In Dic If IsNumeric(k1) And IsNumeric(k2) Then If Dic(k2) Like Dic(k1) + "*" And k2 > k1 Then Dic.Remove (k2) End If If Not Dic.exists(Dic(k1)) Then Dic.Add Dic(k1), Nothing End If Next k2, k1 x = Split(DataRange.Address, "$")(4) While x <> 0 If Not Dic.exists(.Cells(x, "B").Value) Then .Rows(x).Delete x = x - 1 Wend End With End Sub 

使用Workbook.Add第二个变体:

 Sub test2() Dim Dic As Object: Set Dic = CreateObject("Scripting.Dictionary") Dim x&, cl As Range, DataRange As Range, k1, k2 Dic.comparemode = vbTextCompare With Sheets("Unassigned") Set DataRange = .[B1].Resize(.Cells(Rows.Count, "B").End(xlUp).Row, 2) x = 1 For Each cl In DataRange If cl.Value <> "" Then Dic.Add x, cl.Value x = x + 1 End If Next cl For Each k1 In Dic For Each k2 In Dic If Dic(k2) Like Dic(k1) + "*" And k2 > k1 Then Dic.Remove (k2) End If Next k2, k1 End With Workbooks.Add x = 1 For Each k1 In Dic Cells(x, 2) = Dic(k1) x = x + 1 Next k1 End Sub 

testing两个变种

之前: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 在这里输入图像说明 在这里输入图像说明

find一个共同的demoninator为您的公司名称,根据您的示例,这似乎只是删除最后一个字,如果其大于1个字。

 Dim listOfCompanies As New Collection Dim companyName As String Dim companyArray As Variant Dim item As Variant Dim i as Integer, j As Integer 'The 2 denotes column B, where i denotes the row 'You can change this outter loop to your specific needs, this one just processes the first column B1, to when it encounters a blank row while(ThisWorkbook.Worksheets("Unassigned").Cells(i, 2).Value <> "") companyName = ThisWorkbook.Worksheets("Unassigned").Cells(i, 2).Value companyArray = Split(companyName, " ") companyName = "" 'This truncates the last word off for j = 0 to UBound(companyArray) - 1 companyName = companyName + companyArray(j) + " " next j 'Trim off the last space character companyName = Trim(companyName) 'Now Add your companyName string to a Dictionary Object 'VBA will throw an error if a duplicate gets added, but this is okay and we can continue processing On Error Resume Next listOfCompanies.Add(companyName) On Error Goto 0 'This resets the handler in case an error occurs somewhere else unexpectedly i = i + 1 wend 'Now we can do a ForEach and spit out the entire 'unique list' For Each item in listOfCompanies 'Your code here Next item