
我已经交了一个项目,我需要从数据集中的多行中find重复的配对。 虽然数据集大得多,但主要部分是围绕培训date,培训地点和培训人员的名字。 所以每行数据都有一个date,一个位置,然后是逗号分隔的名字列表:

Date Location Names 1/13/2014 Seattle A, B, D 1/16/2014 Dallas C, D, E 1/20/2014 New York A, D 1/23/2014 Dallas C, E 1/27/2014 Seattle B, D 1/30/2014 Houston C, A, F 2/3/2014 Washington DC D, A, F 2/6/2014 Phoenix B, E 2/10/2014 Seattle C, B 2/13/2014 Miami A, B, E 2/17/2014 Miami C, D 2/20/2014 New York B, E, F 2/24/2014 Houston A, B, F 

我的目标是能够find具有类似配对名称的行。 一个例子就是要知道A和B在1/13是在西雅图,2/13在迈阿密,而在2/24是休斯顿,尽pipe每个事件的名字都不相同。 因此,我不想简单地在整个名称string中find重复项,而是希望在“名称”列的部分段中find配对。


虽然我可以手动做到这一点,但它代表了很多时间可以用于其他事情。 如果有一种方法可以实现自动化,那么我的任务就会变得更简单。


你可以用VBA做。 下面的解决scheme假定

  • 您的数据在列A:C的活动工作表上
  • 您的结果将在E:G列中输出
  • 输出将按照成对sorting,然后按date排列,这样您就可以轻松地看到成对的重复。
  • 例行程序一次假设不超过三个培训师,但可以修改添加更多可能的组合。
  • 只有一个教练的城市将被忽略。

例程使用Class模块来收集信息,使用两个Collections来处理数据。 它还利用了集合不允许使用相同密钥添加两个项目的function。


重命名类模块: cPair

 Option Explicit Private pTrainer1 As String Private pTrainer2 As String Private pCity As String Private pDT As Date Public Property Get Trainer1() As String Trainer1 = pTrainer1 End Property Public Property Let Trainer1(Value As String) pTrainer1 = Value End Property Public Property Get Trainer2() As String Trainer2 = pTrainer2 End Property Public Property Let Trainer2(Value As String) pTrainer2 = Value End Property Public Property Get City() As String City = pCity End Property Public Property Let City(Value As String) pCity = Value End Property Public Property Get DT() As Date DT = pDT End Property Public Property Let DT(Value As Date) pDT = Value End Property 


 Option Explicit Option Compare Text Public cP As cPairs, colP As Collection Public colCityPairs As Collection Public vSrc As Variant Public vRes() As Variant Public rRes As Range Public I As Long, J As Long Public V As Variant Public sKey As String Sub FindPairs() vSrc = Range("A1", Cells(Rows.Count, "C").End(xlUp)) Set colP = New Collection Set colCityPairs = New Collection 'Collect Pairs For I = 2 To UBound(vSrc) V = Split(Replace(vSrc(I, 3), " ", ""), ",") If UBound(V) >= 1 Then 'sort the pairs SingleBubbleSort V Select Case UBound(V) Case 1 AddPairs V(0), V(1) Case 2 AddPairs V(0), V(1) AddPairs V(0), V(2) AddPairs V(1), V(2) End Select End If Next I ReDim vRes(0 To colCityPairs.Count, 1 To 3) vRes(0, 1) = "Date" vRes(0, 2) = "Location" vRes(0, 3) = "Pairs" For I = 1 To colCityPairs.Count With colCityPairs(I) vRes(I, 1) = .DT vRes(I, 2) = .City vRes(I, 3) = .Trainer1 & ", " & .Trainer2 End With Next I Set rRes = Range("E1").Resize(UBound(vRes, 1) + 1, UBound(vRes, 2)) With rRes .EntireColumn.Clear .Value = vRes With .Rows(1) .HorizontalAlignment = xlCenter .Font.Bold = True End With .Sort key1:=.Columns(3), order1:=xlAscending, key2:=.Columns(1), order2:=xlAscending, _ Header:=xlYes .EntireColumn.AutoFit V = VBA.Array(vbYellow, vbGreen) J = 0 For I = 2 To rRes.Rows.Count If rRes(I, 3) = rRes(I - 1, 3) Then .Rows(I).Interior.Color = .Rows(I - 1).Interior.Color Else J = J + 1 .Rows(I).Interior.Color = V(J Mod 2) End If Next I End With End Sub Sub AddPairs(T1, T2) Set cP = New cPairs With cP .Trainer1 = T1 .Trainer2 = T2 .City = vSrc(I, 2) .DT = vSrc(I, 1) sKey = .Trainer1 & "|" & .Trainer2 On Error Resume Next colP.Add cP, sKey If Err.Number = 457 Then Err.Clear colCityPairs.Add colP(sKey), sKey & "|" & colP(sKey).DT & "|" & colP(sKey).City colCityPairs.Add cP, sKey & "|" & .DT & "|" & .City Else If Err.Number <> 0 Then Stop End If On Error GoTo 0 End With End Sub Sub SingleBubbleSort(TempArray As Variant) 'copied directly from support.microsoft.com Dim Temp As Variant Dim I As Integer Dim NoExchanges As Integer ' Loop until no more "exchanges" are made. Do NoExchanges = True ' Loop through each element in the array. For I = LBound(TempArray) To UBound(TempArray) - 1 ' If the element is greater than the element ' following it, exchange the two elements. If TempArray(I) > TempArray(I + 1) Then NoExchanges = False Temp = TempArray(I) TempArray(I) = TempArray(I + 1) TempArray(I + 1) = Temp End If Next I Loop While Not (NoExchanges) End Sub 


好。 我感到无聊,在Python代码中做了这一切。 我假设你熟悉这门语言, 然而,你应该能够得到下面的一段代码,在任何安装了Python的计算机上工作。

我做了一些假设。 例如,我已经使用您的示例input作为明确的input。


  • 不要input区分大小写。 谨防大写字母等
  • 具有包含以下行的input文件:“date位置名称”。 只要删除并保持直接的事实在文件中。 我很懒,不要去调整这个。
  • 一大堆其他小东西 只要做程序要求你做的事情,不input时髦的input。


围绕使用人名作为关键字的字典进行旋转。 字典中的值是一个元组,其中包含他们在哪个date期间所在的位置。 通过比较这些集合并得到交集,我们可以find答案。

有点乱,因为我把它当作Python练习。 还没有用Python编码了一段时间,而且我没有使用对象而兴奋不已。 只要按照“说明”,并保存input文件,它存储所有的信息,在代码块运行在同一个文件夹。



 def readWord(line, stringIndex): word = "" while(line[stringIndex] != " "): word += line[stringIndex] stringIndex += 1 return word, stringIndex def removeSpacing(line, stringIndex): while(line[stringIndex] == " "): stringIndex += 1 return stringIndex def readPeople(line, stringIndex): lineSize = len(line) people = [] while(stringIndex < lineSize): people.append(line[stringIndex]) stringIndex += 3 return people def readLine(travels, line): stringIndex = 0 date, stringIndex = readWord(line, stringIndex) stringIndex = removeSpacing(line, stringIndex) location, stringIndex = readWord(line, stringIndex) stringIndex = removeSpacing(line, stringIndex) people = readPeople(line, stringIndex) for person in people: if(person not in travels.keys()): travels[person] = set() travels[person].add((date, location)) return travels def main(): f = open(input("Enter filename (must be in same folder as this program code. For instance, name could be: testDocument.txt\n\n")) travels = dict() for line in f: travels = readLine(travels, line) print("\n\n\n\n PROGRAM RUNNING \n \n") while(True): persons = [] userInput = "empty" while(userInput): userInput = input("Enter person name (Type Enter to finish typing names): ") if(userInput): persons.append(userInput) output = travels[persons[0]] for person in persons[1:]: output = output.intersection(travels[person]) print("") for hit in output: print(hit) print("\nFINISHED WITH ONE RUN. STARTING NEW ONE\n")