模糊string匹配优化（不检查某些单词） – Excel VBAfunction

Excel中有一个函数，用于计算两个string之间的Levenshtein距离（将一个string转换为另一个string所需的插入，删除和/或replace的数量）。我正在使用这个作为我正在进行的项目的一部分，涉及“模糊string匹配”。

下面您将看到LevenshteinDistance函数的代码和一个valuePhrase函数。后者的存在是为了在我的电子表格中执行这个function。我从这个主题中读到了这个。

'Calculate the Levenshtein Distance between two strings (the number of insertions, 'deletions, and substitutions needed to transform the first string into the second)` Public Function LevenshteinDistance(ByRef S1 As String, ByVal S2 As String) As Long Dim L1 As Long, L2 As Long, D() As Long 'Length of input strings and distance matrix Dim i As Long, j As Long, cost As Long 'loop counters and cost of 'substitution for current letter Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution L1 = Len(S1): L2 = Len(S2) ReDim D(0 To L1, 0 To L2) For i = 0 To L1: D(i, 0) = i: Next i For j = 0 To L2: D(0, j) = j: Next j For j = 1 To L2 For i = 1 To L1 cost = Abs(StrComp(Mid$(S1, i, 1), Mid$(S2, j, 1), vbTextCompare)) cI = D(i - 1, j) + 1 cD = D(i, j - 1) + 1 cS = D(i - 1, j - 1) + cost If cI <= cD Then 'Insertion or Substitution If cI <= cS Then D(i, j) = cI Else D(i, j) = cS Else 'Deletion or Substitution If cD <= cS Then D(i, j) = cD Else D(i, j) = cS End If Next i Next j LevenshteinDistance = D(L1, L2) End Function Public Function valuePhrase#(ByRef S1$, ByRef S2$) valuePhrase = LevenshteinDistance(S1, S2) End Function

我在我的一张表中执行这个valuePhrase函数，其中列和行标题是保险公司的名称。理想情况下，任何给定行中最小的数字（Levenshtein距离最短）应该对应一个列表头，表中的保险公司的名称与行头中该保险公司的名称最为匹配。

我的问题是，我试图计算这个问题的string是保险公司的名称。考虑到这一点，上面的代码严格地计算Levenshtein距离，并不是专门针对这种情况。为了说明这个问题，一个简单的例子是，如果两个保险公司名称之间的Levenshtein距离共享“保险”和“公司”这两个字（如您所期望的那样，这是常见的），即使保险公司在其独特的词语上有完全不同的名称。所以，我可能希望函数在比较两个string时忽略这些单词。

我是VBA新手。有没有一种方法可以在代码中实现此修复？作为第二个问题，保险公司的名称可能会出现其他独特的问题吗？感谢您的帮助！

您的整个问题可以replace为“如何在VBA中使用replacefunction？”。一般来说，问题中的algorithm看起来很有趣，所以我已经为你做了这个。只需要在函数的Array()中添加任何东西，它就可以工作（只需要以小写forms写入数组中的值）：

 Public Function removeSpecificWords(s As String) As String Dim arr As Variant Dim cnt As Long arr = Array("insurance", "company", "firma", "firm", "holding") removeSpecificWords = s For cnt = LBound(arr) To UBound(arr) removeSpecificWords = Replace(LCase(removeSpecificWords), LCase(arr(cnt)), vbNullString) Next cnt End Function Public Sub TestMe() Debug.Print removeSpecificWords("InsHolding") Debug.Print removeSpecificWords("InsuranceInsHoldingStar") End Sub

在你的情况下：

  S1 = removeSpecificWords(S1) S2 = removeSpecificWords(S2) valuePhrase = LevenshteinDistance(S1, S2)

当我尝试删除重复的地址时遇到了类似的问题，我以另一种方式处理了这个问题，并使用了最长的公共子string。

 Function DetermineLCS(source As String, target As String) As Double Dim results() As Long Dim sourceLen As Long Dim targetLen As Long Dim counter1 As Long Dim counter2 As Long sourceLen = Len(source) targetLen = Len(target) ReDim results(0 To sourceLen, 0 To targetLen) For counter1 = 1 To sourceLen For counter2 = 1 To targetLen If Mid$(source, counter1, 1) = Mid$(target, counter2, 1) Then results(counter1, counter2) = results(counter1 - 1, counter2 - 1) + 1 Else results(counter1, counter2) = WorksheetFunction.Max(results(counter1, _ counter2 - 1), results(counter1 - 1, counter2)) End If Next counter2 Next counter1 'return the percentage of the LCS to the length of the source string DetermineLCS = results(sourceLen, targetLen) / sourceLen End Function

对于地址，我发现大约有80％的比赛让我接近百分之百的比赛。（我曾经在这个行业工作，所以我知道你面对的问题），我可能会build议90％的目标，甚至Levenshtein距离和LCS的混合，最小化前者，同时最大化后者。

模糊string匹配优化（不检查某些单词） – Excel VBAfunction

如何为Microsoft Excel创buildfilter？

通过VBA隐藏Excel工作表

数字到date

Excel文件导入错误后继续

如何在Excel中logging删除事件

获取Excel INDEX函数find的值的单元格引用

使用Excel VBA，如何在第二个工作簿中的“thisworkbook.close”事件之后保留原始代码？

我如何使excel vba移动到for循环中的下一个单元格？

Delphi ADO：保存DBEdit的更改

通过列名和表名获取列数据