什么是最快的方法来转换arrays字母数字的每个成员？

最后的最终结果：

我想知道下面的结果是否改变，如果string更长。我在同一台计算机上运行完全相同的testing，除了每个单元有一个34个字符的随机string，而不是四个。这是结果：

Comintern (Regexp): 136.1 ms brettdj (Regexp): 139.9 ms Slai (Regexp): 158.4 ms *Original Regex: 161.0 ms* Comintern (AN): 170.1 ms Comintern (Hash): 183.6 ms ThunderFrame: 232.9 ms *Original replace: 372.9 ms* *Original InStr: 478.1 ms* CallumDA33: 1218.1 ms

这真正显示了正则expression式的速度 – 使用Regex.replace的所有解决scheme都显着加快，最好的是实现了共产国际。

总之，如果string很长，使用数组，如果它们很短，则使用剪贴板。如果不确定，最好的结果是使用数组，但是这可能会牺牲短string的一些性能。

最终结果：

非常感谢您的build议，显然我还有很多东西要学。我昨天一直在想这个，所以我决定在家里重新运行一切。这是最后的结果，基于将其中的每一个应用到三万四个字符的string。

我家里的电脑是英特尔i7 @ 3.6 GHz，8GB内存，64位Windows 10和Excel 2016.与之前类似的条件，我有进程在后台运行，但我没有主动做任何事情在整个testing。

 Original replace: 97.67 ms Original InStr: 106.54 ms Original Regex: 113.46 ms ThunderFrame: 82.21 ms Comintern (AN): 96.98 ms Comintern (OR): 81.87 ms Comintern (Hash): 101.18 ms brettdj: 81.66 ms CallumDA33: 201.64 ms Slai: 68.38 ms

因此，我接受了Slai的答案，因为它显然是最快的一般实现方式，但是我将重新运行它们以对付实际数据，以便检查它是否仍然有效。

原文：

我在Excel中有一个数组是一个零件号列表。例如，我需要将数组的每个成员变成字母数字

 ABC123-001 -> ABC123001 ABC123/001 -> ABC123001 ABC123001 -> ABC123001

这样做的最快方法是什么？

对于上下文，我们的零件号码可以有不同的forms，所以我正在写一个函数，在给定的范围内find最好的匹配。目前，使字母数字化的function部分需要大约50ms的运行时间，而function的其余部分总共需要大约30ms。我也无法避免使用Excel。

我自己做了一些工作（见下面的答案），但主要的问题是我必须循环遍历数组中的每个元素 – 能有更好的方法吗？我也从来没有运行testing之前，所以任何改进他们的反馈将不胜感激。

这是我到目前为止所尝试的。

我正在使用MicroTimer ，我的电脑有一个Intel i5 @ 2.5GHz，4GB的内存，64位的Windows 7.我有进程在后台运行，但是我没有主动做任何事情，而这些都运行。

我用这个代码创build了30,000行随机符号：

 =CHAR(RANDBETWEEN(1,60))&CHAR(RANDBETWEEN(48,57))&CHAR(RANDBETWEEN(37,140))&CHAR(RANDBETWEEN(37,140))

（请注意，我们如何在60处停止第一个字符，因为'='是char(61) ，我们希望避免Excel将其解释为公式。另外我们强制第二个字符是一个数字，这样我们可以保证至less一个字母数字字符在那里。）

1.基于案例使用循环。 平均时间：175ms

使用这篇文章中的函数，我们将范围加载到数组中，将该函数应用到数组的每个元素并将其粘贴回来。码：

 Function AlphaNumericOnly(strSource As Variant) As String Dim i As Integer Dim strResult As String For i = 1 To Len(strSource) Select Case Asc(Mid(strSource, i, 1)) Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space strResult = strResult & Mid(strSource, i, 1) End Select Next AlphaNumericOnly = strResult End Function Sub Replace() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Replace") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim i As Integer For i = LBound(arr) To UBound(arr) arr(i, 1) = AlphaNumericOnly(arr(i, 1)) Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

2.使用InStr（）检查每个字符。 平均时间：201ms

定义一个有效值的string。如果有效值出现在数组元素中，请逐个检查：

 Sub InStr() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("InStr") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim validValues As String validValues = "01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" 'put numbers and capitals at the start as they are more likely' Dim i As Integer, j As Integer Dim result As String For i = LBound(arr) To UBound(arr) result = vbNullString For j = 1 To Len(arr(i, 1)) If InStr(validValues, Mid(arr(i, 1), j, 1)) <> 0 Then result = result & Mid(arr(i, 1), j, 1) End If Next j arr(i, 1) = result Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

3.在数组上使用regex.Replace。 时间：171ms

定义一个正则expression式，并用它来replace数组中的每个元素。

 Sub Regex() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Regex") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant arr = inputRng Dim objRegex As Object Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .ignorecase = True .Pattern = "[^\w]" End With Dim i As Integer For i = LBound(arr) To UBound(arr) arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString) Next i outputRng = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

编辑：

@ThunderFrame – 我们的零件编号通常采用以下格式：

所有的数字（例如32523452）
字母和数字的组合（例如AB324K234或123H45645）
字母和数字的混合，每个都由非字母数字字符（例如ABC001-001，ABC001 / 001,123 / 4557-121）

我想在每个string上使用regex.test之前启动到replace，但我不知道这是否只是复制string，然后testing它，在这种情况下，我可能只是做replace开始。

@Slai – 感谢您的链接 – 我会更详细地研究

不知道这是否会更快，因为它取决于太多因素，但可能值得testing。而不是Regex。分别replace每个值，您可以从剪贴板中获取复制的Range文本，并一次replace所有值。请注意， \w匹配下划线和Unicode字母，所以在正则expression式中更具体可以使其更快。

 '[a1:b30000] = [{"ABC123-009",""}]: Dim t As Double: t = Timer ' used for testing Dim r As Range, s As String Set r = ThisWorkbook.Worksheets("Data").UsedRange.Resize(, 1) ' Data!A1:A30000 With New MSForms.DataObject ' needs reference to "Microsoft Forms 2.0 Object Library" or use a bit slower late binding - With CreateObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}") r.Copy .GetFromClipboard Application.CutCopyMode = False s = .GetText .Clear ' optional - clear the clipboard if using Range.PasteSpecial instead of Worksheet.PasteSpecial "Text" With New RegExp ' needs reference to "Microsoft VBScript Regular Expressions 5.5" or use a bit slower late binding - With CreateObject("VBScript.RegExp") .Global = True '.IgnoreCase = False ' .IgnoreCase is False by default .Pattern = "[^0-9A-Za-z\r\n]+" ' because "[^\w\r\n]+" also matches _ and Unicode letters s = .Replace(s, vbNullString) End With .SetText s .PutInClipboard End With ' about 70% of the time is spent here in pasting the data r(, 2).PasteSpecial 'xlPasteValues ' paste the text from clipboard in B1 'Debug.Print Timer - t

由于剪贴板的开销，我预计这会因为较less的值而变慢，并且由于需要内存，可能会因为更多的值而变慢。

禁用事件似乎没有在我的testing中有所作为，但可能值得尝试。

请注意，在macros使用它的时候，另一个应用程序使用剪贴板的可能性很小。

如果早期绑定导致在不同机器上运行相同的编译macros的问题，您可以searchmacros反编译器或删除引用，并切换到后期绑定。

tl; dr – 正则expression式销毁VBA实现。如果这是一个代码挑战，@布赖特或@Slai应该赢得它。

有一些技巧使AlphaNumericOnly更快。

首先，你可以把大部分的函数调用当作一个字节数组而不是一个string。这将删除所有对Mid$和Asc的调用。尽pipe这些function非常快，但是它们仍然会增加开销，从而跳出调用堆栈。这加起来超过了几十万次迭代。

如果可以避免，则第二个优化是不使用Case x To y语法。原因在于它如何编译 – 它不会编译成像Case = Condition >= x And Condition <= y ，它实际上会创build一个循环，提前退出，如下所示：

 Case = False For i = x To y If Condition = i Then Case = True End If Next

再次，不是一个巨大的性能打击，但它加起来。第三个优化是以一种方式对你的testing进行sorting，使他们按照你的数据集中最可能的命中sorting。我在下面主要用字母来定制我的例子，其中大部分都是大写。你可以用不同的顺序做得更好。把它放在一起，你得到了这样的东西：

 Public Function ByteAlphaNumeric(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) 'Load the array up. Dim bound As Long bound = UBound(chars) 'Size the outbound array. ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 'Wide characters, only care about the ASCII range. Dim temp As Byte temp = chars(i) 'Pointer math isn't free. Cache it. Select Case True 'Order is important here. Case temp > 64 And temp < 91 outVal(pos) = temp pos = pos + 2 'Advance the output pointer. Case temp < 48 Case temp > 122 Case temp > 96 outVal(pos) = temp pos = pos + 2 Case temp < 58 outVal(pos) = temp pos = pos + 2 End Select Next 'This is likely the most expensive operation. ReDim Preserve outVal(pos) 'Trim the output array. ByteAlphaNumeric = outVal End Function

它是如何做的？很不错：

 Public Sub Benchmark() Dim starting As Single, i As Long, dummy As String, sample As Variant sample = GetRandomString starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyOP(sample) Next i Debug.Print "OP's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumericOnlyThunderframe(sample) Next i Debug.Print "ThunderFrame's AlphaNumericOnly: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = AlphaNumeric(sample) Next i Debug.Print "CallumDA33's AlphaNumeric: ", Timer - starting starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumeric(sample) Next i Debug.Print "ByteAlphaNumeric: ", Timer - starting Dim cast As String cast = CStr(sample) starting = Timer For i = 1 To 1000000 dummy = ByteAlphaNumericString(cast) Next i Debug.Print "ByteAlphaNumericString: ", Timer - starting Set stripper = Nothing starting = Timer For i = 1 To 1000000 dummy = OptimizedRegex(sample) Next i Debug.Print "OptimizedRegex: ", Timer - starting End Sub Private Function GetRandomString() As Variant Dim chars(30) As Byte, i As Long Randomize For i = 0 To 30 Step 2 chars(i) = Int(96 * Rnd + 32) Next i Dim temp As String temp = chars GetRandomString = CVar(temp) End Function

结果用15个字符随机String ：

 OP`s AlphaNumericOnly: 6.565918 ThunderFrame`s AlphaNumericOnly: 3.617188 CallumDA33`s AlphaNumeric: 23.518070 ByteAlphaNumeric: 2.354980

请注意，我省略了不太可能转换为函数的提交。您可能会注意到2个额外的testing – ByteAlphaNumericString与ByteAlphaNumeric函数完全相同，但它将一个String作为input，而不是一个Variant并且摆脱了转换。这不是微不足道的：

 ByteAlphaNumericString: 2.226074

最后，难以捉摸的OptimizedRegex函数（基本上是用于比较时序的函数forms的@ brettj的代码）：

 Private stripper As RegExp 'Module level Function OptimizedRegex(strSource As Variant) As String If stripper Is Nothing Then Set stripper = New RegExp With stripper .Global = True .Pattern = "[^0-9A-Za-z]" End With End If OptimizedRegex = stripper.Replace(strSource, vbNullString) End Function

 OptimizedRegex: 1.094727

编辑：奖金实施！

在我看来，哈希表查找可能比Select Case结构快，所以我使用Scripting.Dictionary创build了一个：

 Private hash As Scripting.Dictionary 'Module level Function HashLookups(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long With hash For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If .Exists(temp) Then outVal(pos) = temp pos = pos + 2 End If Next End With ReDim Preserve outVal(pos) HashLookups = outVal End Function Private Sub LoadHashTable() Set hash = New Scripting.Dictionary Dim i As Long For i = 48 To 57 hash.Add i, vbNull Next For i = 65 To 90 hash.Add i, vbNull Next For i = 97 To 122 hash.Add i, vbNull Next End Sub 'Test code: starting = Timer LoadHashTable For i = 1 To 1000000 dummy = HashLookups(sample) Next i Debug.Print "HashLookups: ", Timer - starting

事实certificate，这不是太破旧：

 HashLookups: 1.655273

最终版本

醒来，我想我会尝试一个vector查找，而不是一个哈希查找（只需填写一个值的字节数组保持和使用的testing）。这似乎是合理的，因为它只是一个256个元素的数组 – 基本上是一个真值表：

 Private lookup(255) As Boolean 'Module level Function VectorLookup(source As Variant) As String Dim chars() As Byte Dim outVal() As Byte chars = CStr(source) Dim bound As Long bound = UBound(chars) ReDim outVal(bound) Dim i As Long, pos As Long For i = 0 To bound Step 2 Dim temp As Byte temp = chars(i) If lookup(temp) Then outVal(pos) = temp pos = pos + 2 End If Next ReDim Preserve outVal(pos) VectorLookup = outVal End Function Private Sub GenerateTable() Dim i As Long For i = 48 To 57 lookup(i) = True Next For i = 65 To 90 lookup(i) = True Next For i = 97 To 122 lookup(i) = True Next End Sub

假设查找表只生成一次，它的计时速度比上面任何一种纯VBA方法快10-15％。

感谢ThunderFrame（我是LHS Mid$的吸血鬼），但是我从早期的RegExp获得了更好的性能，并进行了额外的小调整：

使用Value2而不是Value
声明你的循环长不是整数
.ignorecase = True是多余的

码

  Sub Replace2() Dim inputSh As Worksheet Dim inputRng As Range Set inputSh = Sheets("Data") Set inputRng = inputSh.Range("A1:A30000") Dim outputSh As Worksheet Dim outputRng As Range Set outputSh = Sheets("Replace") Set outputRng = outputSh.Range("A1:A30000") Dim time1 As Double, time2 As Double time1 = MicroTimer Dim arr As Variant Dim objRegex As VBScript_RegExp_55.RegExp Dim i As Long Set objRegex = CreateObject("vbscript.regexp") With objRegex .Global = True .Pattern = "[^\w]" End With arr = inputRng.Value2 For i = LBound(arr) To UBound(arr) arr(i, 1) = objRegex.Replace(arr(i, 1), vbNullString) Next i outputRng.Value2 = arr time2 = MicroTimer Debug.Print (time2 - time1) * 1000 End Sub

如果您将第一个function（目前最佳的function）更改为以下function，那么根据您的数据，您至less可以获得40-50％的性能提升：

 Function AlphaNumericOnly(strSource As Variant) As String Dim i As Long Dim charCount As Long Dim strResult As String Dim char As String strResult = Space$(Len(strSource)) For i = 1 To Len(strSource) char = Mid$(strSource, i, 1) Select Case Asc(char) Case 48 To 57, 65 To 90, 97 To 122: 'include 32 if you want to include space charCount = charCount + 1 Mid$(strResult, charCount, 1) = char End Select Next AlphaNumericOnly = Left$(strResult, charCount) End Function

我使用了一些优化，但主要是在一个循环中多次重新分配strResult ，而这个循环非常昂贵，而当您的string更长（并且循环运行次数更多）时，则更为昂贵。更好地使用Mid$ 。

而且，使用$ -suffixed函数对string进行了优化，所以你也可以获得更好的性能

优化RegEx版本

您的正则expression式方法具有合理的性能，但是您使用的是后期绑定的CreateObject ，这会比早期绑定的强types引用快得多。

而且，你的正则expression式模式和选项每次都是一样的，你可以将正则expression式对象声明为variables，只有当它不存在时才创build它，然后每次重新使用现有的正则expression式。

我会把它扔到那里，如果没有别的，看看它是如何执行的。我相信它也可以被整理一下。

我希望testing一个字符是一个字母的方法会变得更快。我敢肯定，testing一个数字可以做得更快一点。

 Function AlphaNumeric(s As String) As String Dim char As String, tempStr As String Dim i As Integer Dim t As Variant For i = 1 To Len(s) char = Mid(s, i, 1) If IsLetter(char) Or IsNumber(char) Then tempStr = tempStr & char End If Next i AlphaNumeric = tempStr End Function Private Function IsLetter(s As String) As Boolean If UCase(s) = s And LCase(s) = s Then IsLetter = False Else: IsLetter = True End If End Function Private Function IsNumber(s As String) On Error GoTo 1 s = s * 1 IsNumber = True Exit Function 1: IsNumber = False End Function

什么是最快的方法来转换arrays字母数字的每个成员？

编辑：奖金实施！

最终版本

在Excel中的数组相乘

如果D列是直接的，Sumproduct（E，F）

使用ARRAY中的variables声明单元格（行，列）

在基于VBA中另一个工作表中的值使用数组的数组时，无法过滤出一个工作表上的行

如何打印循环导致R中的上三angularmatrix

Redim可变数量的维度在VBA中

通过数组遍历 – 返回错误“试图访问A（：，3）; 指数越界，因为大小（A）= “

将文本分配给数组

停止n数组1转换为1维

总和值使用数组和INDEX