有什么办法可以加快这个VBAalgorithm吗？

我期望实现一个能够在相对较短的时间内（不到15-20秒）处理大量英语词典（〜50,000字）的VBA 索引构buildalgorithm。由于我是一名C ++程序员，在实践中（这是我第一次做大量的VBA工作），所以我build立了一个快速的概念validation程序，能够在大约半秒内完成计算机上的任务。但是，到了testingVBA端口的时候，花了差不多两分钟的时间来做同样的事情 – 这对我来说是一个无法接受的长时间。 VBA代码如下：

节点类模块：

Public letter As String Public next_nodes As New Collection Public is_word As Boolean

主要模块：

 Dim tree As Node Sub build_trie() Set tree = New Node Dim file, a, b, c As Integer Dim current As Node Dim wordlist As Collection Set wordlist = New Collection file = FreeFile Open "C:\corncob_caps.txt" For Input As file Do While Not EOF(file) Dim line As String Line Input #file, line wordlist.add line Loop For a = 1 To wordlist.Count Set current = tree For b = 1 To Len(wordlist.Item(a)) Dim match As Boolean match = False Dim char As String char = Mid(wordlist.Item(a), b, 1) For c = 1 To current.next_nodes.Count If char = current.next_nodes.Item(c).letter Then Set current = current.next_nodes.Item(c) match = True Exit For End If Next c If Not match Then Dim new_node As Node Set new_node = New Node new_node.letter = char current.next_nodes.add new_node Set current = new_node End If Next b current.is_word = True Next a End Sub

那么我的问题就是，这个algorithm可以加速吗？我从一些资料中看到，VBA Collection并不像Dictionary那样高效，所以我试图用一个基于Dictionary的实现来代替，但是花费了相当长的时间来完成更糟的内存使用（Excel使用了500多MB的RAM在我的电脑上）。正如我所说，我对VBA来说是非常新的，所以我对它的语法和整体特征/局限性的了解是非常有限的 – 这就是为什么我不相信这个algorithm是有效的，任何提示/build议将不胜感激。

提前致谢

注意：代码“corncob_caps.txt”引用的词典文件在这里可用（下载“所有CAPS”文件）

这里有一些小问题和一些更大的机会。你确实说过这是你的第一个vba工作，所以如果我告诉你你已经知道的事情，请原谅我

小事第一：
Dim file, a, b, c As Integer声明文件，a和b作为变体。 Integer是16位的符号，所以可能会有溢出的风险，用Long代替。

DIM内循环是相反的：与C ++不同，它们不是循环范围的。

真正的机会是：

使用For Each可以迭代集合：它比索引更快。

在我的硬件上，您的原始代码在160年代左右运行。这个代码在2.5s左右（加上joinword文件的时间都加进去了，大约4s）

 Sub build_trie() Dim t1 As Long Dim wd As Variant Dim nd As Node Set tree = New Node ' Dim file, a, b, c As Integer : declares file, a, b as variant Dim file As Integer, a As Long, b As Long, c As Long ' Integer is 16 bit signed Dim current As Node Dim wordlist As Collection Set wordlist = New Collection file = FreeFile Open "C:\corncob_caps.txt" For Input As file ' no point in doing inside loop, they are not scoped to the loop Dim line As String Dim match As Boolean Dim char As String Dim new_node As Node Do While Not EOF(file) 'Dim line As String Line Input #file, line wordlist.Add line Loop t1 = GetTickCount For Each wd In wordlist ' for each is faster 'For a = 1 To wordlist.Count Set current = tree For b = 1 To Len(wd) 'Dim match As Boolean match = False 'Dim char As String char = Mid$(wd, b, 1) For Each nd In current.next_nodes 'For c = 1 To current.next_nodes.Count If char = nd.letter Then 'If char = current.next_nodes.Item(c).letter Then Set current = nd 'Set current = current.next_nodes.Item(c) match = True Exit For End If Next nd If Not match Then 'Dim new_node As Node Set new_node = New Node new_node.letter = char current.next_nodes.Add new_node Set current = new_node End If Next b current.is_word = True Next wd Debug.Print "Time = " & GetTickCount - t1 & " ms" End Sub

编辑：

将单词列表加载到dynamic数组中会将加载时间减less到秒以下。请注意，Redim Preserve价格昂贵，所以要分块

  Dim i As Long, sz As Long sz = 10000 Dim wordlist() As String ReDim wordlist(0 To sz) file = FreeFile Open "C:\corncob_caps.txt" For Input As file i = 0 Do While Not EOF(file) 'Dim line As String Line Input #file, line wordlist(i) = line i = i + 1 If i > sz Then sz = sz + 10000 ReDim Preserve wordlist(0 To sz) End If 'wordlist.Add line Loop ReDim Preserve wordlist(0 To i - 1)

然后像这样循环

  For i = 0 To UBound(wordlist) wd = wordlist(i)

我用VBA练习了，但是IIRC，用For Each迭代集合应该比用数字来快一点：

 Dim i As Variant For Each i In current.next_nodes If i.letter = char Then Set current = i match = True Exit For End If Next node

你也没有使用Collection的全部function。这是一个键值映射，而不仅仅是一个可resize的数组。如果将字母用作关键字，您可能会获得更好的性能，但查找不存在的关键字会引发错误，因此您必须使用丑陋的错误解决方法来检查每个节点。 b循环的内部看起来像：

 Dim char As String char = Mid(wordlist.Item(a), b, 1) Dim node As Node On Error Resume Next Set node = Nothing Set node = current.next_nodes.Item(char) On Error Goto 0 If node Is Nothing Then Set node = New Node current.next_nodes.add node, char Endif Set current = node

这样就不需要class Node上的字母variables。

我没有testing这个。我希望没关系

编辑：修复For Each循环。

你可以做的另一件事情可能会慢一点，但会使用更less的内存是使用数组而不是集合，并调整每个添加的元素。数组不能在类上公开，所以你必须添加方法来处理它：

 Public letter As String Private next_nodes() As Node Public is_word As Boolean Public Sub addNode(new_node As Node) Dim current_size As Integer On Error Resume Next current_size = UBound(next_nodes) 'ubound throws an error if the array is not yet allocated On Error GoTo 0 ReDim next_nodes(0 To current_size) As Node Set next_nodes(current_size) = new_node End Sub Public Function getNode(letter As String) As Node Dim n As Variant On Error Resume Next For Each n In next_nodes If n.letter = letter Then Set getNode = n Exit Function End If Next End Function

编辑：最后的优化策略，用Asc函数获取Integer char值，并存储它而不是一个string。

你真的需要分析它，但是如果你认为Collections很慢，也许你可以尝试使用dynamic数组？

有什么办法可以加快这个VBAalgorithm吗？

Excel或Google文档：在线图上绘制date的实例

如果variables声明了types，Excel VBA函数会崩溃

在java中，csv到xls的数量超过了65536行

不连续求解器约束解决方法

excel vba计数器和他们的一生（ish）

根据date更改工作簿中工作表上的选项卡的颜色

查看最常见的行

将XML文档中的数据parsing到Excel工作表

如何使用Excel.Range.set_Value（）指定单个单元格的格式

在Excel中插入标准曲线