在两张不同的表中匹配部分文本string(90%)两列

我试图将(90%)部分文本string从表单列匹配到另一个表单列,并将最终结果带到主表单列中。 我发现了一个VBA解决scheme,但是我遇到了一些问题。 1)匹配确切的文本2)find一个问题,以匹配两个不同的表单列。

请帮我解决这个问题。

Sub lookup() Dim TotalRows As Long Dim rng As Range Dim i As Long 'Copy lookup values from sheet1 to sheet3 Sheets("BANK STATEMENT ENTRY").Select TotalRows = ActiveSheet.UsedRange.Rows.Count Range("F3:F" & TotalRows).Copy Destination:=Sheets("TEST").Range("A1") 'Go to the destination sheet Sheets("TEST").Select For i = 1 To TotalRows 'Search for the value on sheet2 Set rng = Sheets("INFO").UsedRange.Find(Cells(i, 1).Value) 'If it is found put its value on the destination sheet If Not rng Is Nothing Then Cells(i, 2).Value = rng.Value End If Next End Sub 

我已经做了一个文本挖掘项目,我知道你不能使用这种方法,你必须将string分解成子string,然后分析它们。 这将是一个完整的项目,但是因为我为你做了,所以你是幸运的。

让我们简化问题,并说你有两个string范围,你想find两个组之间的每个类似的string。 另外,您希望有一个容差来最小化匹配对。

假设ABCDE和12BCD00。 他们有B,C,D,BC,CD和BCD。 所以最长的公共子string是3个字符的BCD:3 / ABCDE(5)的长度与第一个string的相似度为60%,相似度为3/7 = 43%。 所以,如果你能得到两个范围内的所有string中的所有常见子串的列表,你可以拿出一个更好的列表来过滤和得到你想要的。

我写了一堆function。 为了方便使用,只需将两组string复制并粘贴到一张纸上,然后在同一张纸上生成最终报告,以了解其工作原理。

Function FuzzyFind,find所有常见的子string,并给出第一个string从Group1 / Range1,第二个string从第二/ Range2,公共子串和两个string的相似度百分比。 好的是,你可以告诉函数有多less你想要你的子串,例如在前面的例子中,如果你说iMinCommonSubLength = 3,它只会给你BCD,如果你说iMinCommonSubLength = 2它会给你BC,CD和BCD等等。

使用functionMain。 我还包括一个testing子。

function:

 Sub TestIt() Call Main(ActiveSheet.Range("A1:A10"), ActiveSheet.Range("B1:B10"), 4, ActiveSheet.Range("D1")) End Sub Sub Main(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer, Optional rngReportUpperLeftCell As Range) Dim arr() As Variant Dim rngReport As Range If rngReport Is Nothing Then Set rngReport = ActiveSheet.Range("A1") arr = FuzzyFind(rng1, rng2, iMinCommonSubLength) Set rngReport = rngReportUpperLeftCell.Resize(UBound(arr, 1), UBound(arr, 2)) rngReport.Value = arr rngReport.Columns(1).NumberFormat = "@" rngReport.Columns(2).NumberFormat = "@" rngReport.Columns(3).NumberFormat = "@" rngReport.Columns(4).NumberFormat = "0%" rngReport.Columns(5).NumberFormat = "0%" End Sub Function GetCharacters(str As String) As Variant Dim arr() As String ReDim arr(Len(str) - 1) For i = 1 To Len(str) arr(i - 1) = Mid$(UCase(str), i, 1) Next GetCharacters = arr End Function Function GetIterations(iStringLength As Integer, iSubStringLength As Integer) As Integer If iStringLength >= iSubStringLength Then GetIterations = iStringLength - iSubStringLength + 1 Else GetIterations = 0 End If End Function Function GetSubtrings(str As String, iSubLength As Integer) As Variant Dim i As Integer Dim count As Integer Dim arr() As Variant count = GetIterations(Len(str), iSubLength) ReDim arr(1 To count) For i = 1 To count arr(i) = Mid(str, i, iSubLength) Next i GetSubtrings = arr() End Function Function GetLongestCommonSubStrings(str1 As String, str2 As String, iMinCommonSubLeng As Integer) Dim i As Integer Dim iLongestPossible As Integer Dim iShortest As Integer Dim arrSubs() As Variant Dim arr1() As Variant Dim arr2() As Variant ReDim arrSubs(1 To 1) 'Longest possible common substring length is the smaller string's length iLongestPossible = IIf(Len(str1) > Len(str2), Len(str2), Len(str1)) If iLongestPossible < iMinCommonSubLeng Then 'MsgBox "Minimum common substring length is larger than the shortest string." & _ ' " You have to choose a smaller common length", , "Error" Else 'We will try to find the first match of common substrings of two given strings, exit after the first match For i = iLongestPossible To iMinCommonSubLeng Step -1 arr1 = GetSubtrings(str1, i) arr2 = GetSubtrings(str2, i) ReDim arrSubs(1 To 1) arrSubs = GetCommonElement(arr1, arr2) If arrSubs(1) <> "" Then Exit For 'if you want JUST THE LONGEST MATCH, comment out this line Next i End If GetLongestCommonSubStrings = arrSubs End Function Function GetCommonElement(arr1() As Variant, arr2() As Variant) As Variant Dim i As Integer Dim j As Integer Dim count As Integer Dim arr() As Variant count = 1 ReDim arr(1 To count) For i = 1 To UBound(arr1) For j = 1 To UBound(arr2) If arr1(i) = arr2(j) Then ReDim Preserve arr(1 To count) arr(count) = arr1(i) count = count + 1 End If Next j Next i GetCommonElement = arr End Function Function FuzzyFind(rng1 As Range, rng2 As Range, iMinCommonSubLength As Integer) As Variant Dim count As Integer Dim i As Integer Dim arrSubs As Variant Dim str1 As String Dim str2 As String Dim cell1 As Range Dim cell2 As Range Dim rngReport As Range Dim arr() As Variant 'array of all cells that are partially matching, str1, str2, common string, percentage count = 1 ReDim arr(1 To 5, 1 To count) For Each cell1 In rng1 str1 = UCase(CStr(cell1.Value)) If str1 <> "" Then For Each cell2 In rng2 str2 = UCase(CStr(cell2.Value)) If str2 <> "" Then ReDim arrSubs(1 To 1) arrSubs = GetLongestCommonSubStrings(str1, str2, iMinCommonSubLength) If arrSubs(1) <> "" Then For i = 1 To UBound(arrSubs) arr(1, count) = cell1.Value arr(2, count) = cell2.Value arr(3, count) = arrSubs(i) arr(4, count) = Len(arrSubs(i)) / Len(str1) arr(5, count) = Len(arrSubs(i)) / Len(str2) count = count + 1 ReDim Preserve arr(1 To 5, 1 To count) Next i End If End If Next cell2 End If Next cell1 FuzzyFind = TransposeArray(arr) End Function Function TransposeArray(arr As Variant) As Variant Dim arrTemp() As Variant ReDim arrTemp(LBound(arr, 2) To UBound(arr, 2), LBound(arr, 1) To UBound(arr, 1)) For a = LBound(arr, 2) To UBound(arr, 2) For b = LBound(arr, 1) To UBound(arr, 1) arrTemp(a, b) = arr(b, a) Next b Next a TransposeArray = arrTemp End Function 

在生成新的报告之前,不要忘记清除表格。 插入一个表,并使用其自动filter来轻松过滤你的东西。

最后但并非最不重要的,不要忘记点击复选标记来宣布这个问题的答案。