VBA中的RegEx:将复杂的string分解为多个标记?

我正在使用Excel 2000/2003来parsing一个mmCIF蛋白质文件中的一行到单独的记号。 最坏的情况下它可能看起来像这样:

token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken 

哪个应该成为下列标记:

 token1 token2 token's 1a',1b' (note: the double quotes have disappeared) token4"5" (note: the single quotes have disappeared) 12 23.2 ? . token (note: the single quotes have disappeared) to'ken to"ken 

我正在寻找一个正则expression式甚至可以将这种types的行分解为令牌吗?

很好的拼图。 谢谢。

这种模式(下面的aPatt)获取令牌分离,但我不知道如何删除外部引号。

tallpaul()产生:

  token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken 

如果你能弄清楚如何丢失外部引号,请告诉我们。 这需要参考“Microsoft VBScript正则expression式”的工作。

 Option Explicit ''returns a list of matches Function RegExpTest(patrn, strng) Dim regEx ' Create variable. Set regEx = New RegExp ' Create a regular expression. regEx.Pattern = patrn ' Set pattern. regEx.IgnoreCase = True ' Set case insensitivity. regEx.Global = True ' Set global applicability. Set RegExpTest = regEx.Execute(strng) ' Execute search. End Function Function tallpaul() As Boolean Dim aString As String Dim aPatt As String Dim aMatch, aMatches '' need to pad the string with leading and trailing spaces. aString = " token1 token2 ""token's 1a',1b'"" 'token4""5""' 12 23.2 ? . 'token' tok'en to""ken " aPatt = "(\s'[^']+'(?=\s))|(\s""[^""]+""(?=\s))|(\s[\w\?\.]+(?=\s))|(\s\S+(?=\s))" Set aMatches = RegExpTest(aPatt, aString) For Each aMatch In aMatches Debug.Print aMatch.Value Next tallpaul = True End Function 

有可能做到:

您需要在VBA项目中引用“Microsoft VBScript Regular Expressions 5.5”,然后…

 Private Sub REFinder(PatternString As String, StringToTest As String) Set RE = New RegExp With RE .Global = True .MultiLine = False .IgnoreCase = False .Pattern = PatternString End With Set Matches = RE.Execute(StringToTest) For Each Match In Matches Debug.Print Match.Value & " ~~~ " & Match.FirstIndex & " - " & Match.Length & " = " & Mid(StringToTest, Match.FirstIndex + 1, Match.Length) ''#You get a submatch for each of the other possible conditions (if using ORs) For Each Item In Match.SubMatches Debug.Print "Submatch:" & Item Next Item Debug.Print Next Match Set RE = Nothing Set Matches = Nothing Set Match = Nothing Set SubMatch = Nothing End Sub Sub DoIt() ''#This simply splits by space... REFinder "([.^\w]+\s)|(.+$)", "Token1 Token2 65.56" End Sub 

这显然只是一个非常简单的例子,因为我不是很了解RegExp,更多的是向你展示它如何在VBA中完成(你可能还想做一些比Debug.Print更有用的方法,令牌!)。 我不得不离开写RegExpexpression式给别人我害怕!

西蒙