如何编写一个vba代码来删除和replaceUTF8-Characters

我有这个代码,我似乎无法用简单的“占位符”replace我的数据中的非英文字符,如越南语或泰语。

Sub NonLatin() Dim cell As Range For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp)) s = cell.Value For i = 1 To Len(s) If Mid(s, i, 1) Like "[!A-Za-z0-9@#$%^&* * ]" Then cell.Value = "placeholder" Next Next End Sub 

感谢你的帮助

有关在VBA代码中使用正则expression式的详细信息,请参阅此问题


然后在像这样的函数中使用正则expression式来处理string。 在这里我假设你想用占位符而不是整个stringreplace每个无效字符 。 如果是整个string,则不需要进行单独的字符检查,只需在正则expression式模式中对多个字符使用+*限定符,并将整个string一起testing即可。

 Function LatinString(str As String) As String ' After including a reference to "Microsoft VBScript Regular Expressions 5.5" ' Set up the regular expressions object Dim regEx As New RegExp With regEx .Global = True .MultiLine = True .IgnoreCase = False ' This is the pattern of ALLOWED characters. ' Note that special characters should be escaped using a slash eg \$ not $ .Pattern = "[A-Za-z0-9]" End With ' Loop through characters in string. Replace disallowed characters with "?" Dim i As Long For i = 1 To Len(str) If Not regEx.Test(Mid(str, i, 1)) Then str = Left(str, i - 1) & "?" & Mid(str, i + 1) End If Next i ' Return output LatinString = str End Function 

你可以在你的代码中使用这个

 Dim cell As Range For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp)) cell.Value = LatinString(cell.Value) Next 

对于将Unicodestring转换为UTF8string而不使用正则expression式的字节级方法,请参阅本文

你可以使用下面的代码replace占位符以外的任何不在ASCII范围(前128个字符)的字符:

 Option Explicit Sub Test() Dim oCell As Range With CreateObject("VBScript.RegExp") .Global = True .Pattern = "[^u0000-u00F7]" For Each oCell In [A1:C4] oCell.Value = .Replace(oCell.Value, "*") Next End With End Sub