将Excel中的2个列表与VBA正则expression式进行比较
我想用它们来比较Excel中的两个列表(列)以查找匹配项。 由于这是一个非常复杂的操作,我以前在Excel中使用了几个不同的函数(非VBA),但是事实certificate它最多是尴尬的,所以我想尝试一个全合一的VBA解决scheme,如果可能的话。
第一列有不规则的名称(例如引用的昵称,后缀如“jr”或“sr”,括号中的“首选”版本)。 另外,当中间名字出现时,它们可能是名字或者是名字。
第一列的顺序是:
<first name or initial> <space> <any parenthetical 'preferred' names - if they exist> <space> <middle name or initial - if it exists> <space> <quoted nickname or initial - if it exists> <space> <last name> <comma - if necessary><space - if necessary><suffix - if it exists>
第二栏的顺序是:
`<lastname><space><suffix>,<firstname><space><middle name, if it exists>`
,没有任何第一栏中的“违规行为”。
我的主要目标是按照以下顺序“清理”第一列:
`lastname-space-suffix,firstname-space-preferred name-space- middle name-space-nickname`
尽pipe我在这里保留了“违规行为”,但是我可能会在比较代码中使用某种“标志”来逐个提醒我。
我一直在尝试几种模式,这是我最近的:
["]?([A-Za-z]?)[.]?["]?[.]?[\s]?[,]?[\s]?
不过,我想允许姓和后缀(如果存在)。 我已经用“全局”来testing它,但是我不知道如何通过反向引用来分隔姓和后缀。
然后,我想比较两个列表之间的最后一个,第一个,中间首字母(因为大多数名字只是第一个列表中的首字母)。
An example would be: (1st list) John (Johnny) B. "Abe" Smith, Jr. turned into: Smith Jr,John (Johnny) B "Abe" or Smith Jr,John B and (2nd list) Smith Jr,John Bertrand turned into: Smith Jr,John B Then run a comparison between the two columns.
这个清单比较会是一个好的开始还是延续点?
2012年4月10日附件:
作为一个便笺,我将需要消除来自首选名称的昵称和括号中的引号。 我可以将分组引用进一步分解为子组(在下面的例子中)?
(?: ([ ] \( [^)]* \)))? # (2) parenthetical 'preferred' name (optional) (?: ([ ] (["'] ) .*?) \6 )? # (5,6) quoted nickname or initial (optional)
我可以像这样对他们进行分组:
(?:(([ ])(\()([^)]*)(\))))? # (2) parenthetical 'preferred' name (optional) not sure how to do this one - # (5,6) quoted nickname or initial (optional)
我在“Regex Coach”和“RegExr”中试过,他们工作的很好,但是在VBA中,当我想要返回的反向引用时,所有返回的都是名字,数字1和逗号(例如“Carl1”)。 我要回去检查是否有错别字。 谢谢你的帮助。
2012年4月17日附件:
我忽略了一个名字“情况”,那就是由两个或两个以上单词组成的姓氏,例如“St Cyr”或“Von Wilhelm”。
会增加下面的内容
`((St|Von)[ ])?
在这个正则expression式,你提供的?
`((St|Von)[ ])?([^\,()"']+)
我在Regex Coach和RegExr中的testing还没有完成,因为replace返回“St”,前面有一个空格。
重做 –
这是不同的方法。 它可能在你的VBA中工作,只是一个例子。 我在Perl中testing了它,它工作得很好。 但是,我不会显示Perl代码,
只是正则expression式的一些解释。
这是一个两步的过程。
- 标准化列文本
- 做主要的parsing
规范化过程
- 获取列值
- 去掉所有的点
.
– 全局search\.
,什么都不换 - 将空格转换为空格 – 全局search
\s+
,replace为单个空格[ ]
(请注意,如果不能正常化,不pipe尝试什么,我都没有太多的成功机会)
主要parsing过程
标准化一个列值后(对两列做),通过这些正则expression式运行。
第1列正则expression式
^ [ ]? ([^\ ,()"']+) # (1) first name or initial (required) (?: ([ ] \( [^)]* \)) )? # (2) parenthetical 'preferred' name (optional) (?: ([ ] [^\ ,()"'] ) # (3,4) middle initial OR name (optional) ([^\ ,()"']*) # name and initial are both captured )? (?: ([ ] (["'] ) .*?) \6 )? # (5,6) quoted nickname or initial (optional) [ ] ([^\ ,()"']+) # (7) last name (required) (?: [, ]* ([ ].+?) [ ]? # (8) suffix (optional) | .*? )? $
更换取决于你想要的。
定义了三种types(根据需要用\
replace$
):
- 1a型全中 –
$7$8,$1$2$3$4$5$6
- 1b型中间初始 –
$7$8,$1$2$3$5$6
- types2中间初始 –
$7$8,$1$3
转换示例:
Input (raw) = 'John (Johnny) Bertrand "Abe" Smith, Jr. ' Out type 1 full middle = 'Smith Jr,John (Johnny) Bertrand "Abe"' Out type 1 middle initial = 'Smith Jr,John (Johnny) B "Abe"' Out type 2 middle initial = 'Smith Jr,John B'
第2列正则expression式
^ [ ]? ([^\ ,()"']+) # (1) last name (required) (?: ([ ] [^\ ,()"']+) )? # (2) suffix (optional) , ([^\ ,()"']+) # (3) first name or initial (required) (?: ([ ] [^\ ,()"']) # (4,5) middle initial OR name (optional) ([^\ ,()"']*) )? .* $
更换取决于你想要的。
定义了两种types(根据需要用$
replace$
):
- 1a型全中 –
$1$2,$3$4$5
- types1b中间初始 –
$1$2,$3$4
转换示例:
Input = 'Smith Jr.,John Bertrand ' Out type 1 full middle = 'Smith Jr,John Bertrand' Out type 1 middle initial = 'Smith Jr,John B'
VBAreplace帮助
这工作在一个非常旧的Excel副本,创build一个VBA项目。
这两个模块是为了显示一个例子而创build的。
他们都做同样的事情。
第一个是所有可能的replacetypes的详细例子。
第二个是使用types2比较的修剪版本。
我以前没有做过VB,但是应该很简单
为你收集如何更换工作,以及如何配合的Excel
列。
如果你只是做一个平坦的比较,你可能想要做一个col 1 val
一次,然后检查列2中的每个值,然后转到下一个val
第1列,然后重复。
为了最快的方式做到这一点,创build2个额外的列,转换尊重
列valstypes2(variablesstrC1_2和strC2_2,请参阅示例),然后复制它们
到新的专栏。
之后,你不需要regex,只需比较列,find匹配的行,
然后删除types2列。
详细 –
Sub RegexColumnValueComparison() ' Column 1 and 2 , Sample values ' These should probably be passed in values ' ============================================ strC1 = "John (Johnny) Bertrand ""Abe"" Smith, Jr. " strC2 = "Smith Jr.,John Bertrand " ' Normalization Regexs for whitespace's and period's ' (use for both column values) ' ============================================= Set rxDot = CreateObject("vbscript.regexp") rxDot.Global = True rxDot.Pattern = "\." Set rxWSp = CreateObject("vbscript.regexp") rxWSp.Global = True rxWSp.Pattern = "\s+" ' Column 1 Regex ' ================== Set rxC1 = CreateObject("vbscript.regexp") rxC1.Global = False rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$" ' Column 2 Regex ' ================== Set rxC2 = CreateObject("vbscript.regexp") rxC2.Global = False rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$" ' Normalize column 1 and 2, Copy to new var ' ============================================ strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "") strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "") ' ------------------------------------------------------ ' This section is informational ' Shows some sample replacements before comparison ' Just pick 1 replacement from each column, discard the rest ' ------------------------------------------------------ ' Create Some Replacement Types for Column 1 ' ===================================================== strC1_1a = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$4$5$6") strC1_1b = rxC1.Replace(strC1_Normal, "$7$8,$1$2$3$5$6") strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3") ' Create Some Replacement Types for Column 2 ' ===================================================== strC2_1b = rxC2.Replace(strC2_Normal, "$1$2,$3$4$5") strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4") ' Show Types in Message Box ' ===================================================== c1_t1a = "Column1 Types:" & Chr(13) & "type 1a full middle - " & strC1_1a c1_t1b = "type 1b middle initial - " & strC1_1b c1_t2 = "type 2 middle initial - " & strC1_2 c2_t1b = "Column2 Types:" & Chr(13) & "type 1b middle initial - " & strC2_1b c2_t2 = "type 2 middle initial - " & strC2_2 MsgBox (c1_t1a & Chr(13) & c1_t1b & Chr(13) & c1_t2 & Chr(13) & Chr(13) & c2_t1b & Chr(13) & c2_t2) ' ------------------------------------------------------ ' Compare a Value from Column 1 vs Column 2 ' For this we will compare Type 2 values ' ------------------------------------------------------ If strC1_2 = strC2_2 Then MsgBox ("Type 2 values are EQUAL: " & Chr(13) & strC1_2) Else MsgBox ("Type 2 values are NOT Equal:" & Chr(13) & strC1_2 & " != " & strC1_2) End If ' ------------------------------------------------------ ' Same comparison (Type 2) of Normalized column 1,2 values ' In esscense, this is all you need ' ------------------------------------------------------ If rxC1.Replace(strC1_Normal, "$7$8,$1$3") = rxC2.Replace(strC2_Normal, "$1$2,$3$4") Then MsgBox ("Type 2 values are EQUAL") Else MsgBox ("Type 2 values are NOT Equal") End If End Sub
只有types2 –
Sub RegexColumnValueComparison() ' Column 1 and 2 , Sample values ' These should probably be passed in values ' ============================================ strC1 = "John (Johnny) Bertrand ""Abe"" Smith, Jr. " strC2 = "Smith Jr.,John Bertrand " ' Normalization Regexes for whitespace's and period's ' (use for both column values) ' ============================================= Set rxDot = CreateObject("vbscript.regexp") rxDot.Global = True rxDot.Pattern = "\." Set rxWSp = CreateObject("vbscript.regexp") rxWSp.Global = True rxWSp.Pattern = "\s+" ' Column 1 Regex ' ================== Set rxC1 = CreateObject("vbscript.regexp") rxC1.Global = False rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:([ ]\([^)]*\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ]([""']).*?)\6)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$" ' Column 2 Regex ' ================== Set rxC2 = CreateObject("vbscript.regexp") rxC2.Global = False rxC2.Pattern = "^[ ]?([^ ,()""']+)(?:([ ][^ ,()""']+))?,([^ ,()""']+)(?:([ ][^ ,()""'])([^ ,()""']*))?.*$" ' Normalize column 1 and 2, Copy to new var ' ============================================ strC1_Normal = rxDot.Replace(rxWSp.Replace(strC1, " "), "") strC2_Normal = rxDot.Replace(rxWSp.Replace(strC2, " "), "") ' Comparison (Type 2) of Normalized column 1,2 values ' ============================================ strC1_2 = rxC1.Replace(strC1_Normal, "$7$8,$1$3") strC2_2 = rxC2.Replace(strC2_Normal, "$1$2,$3$4") If strC1_2 = strC2_2 Then MsgBox ("Type 2 values are EQUAL") Else MsgBox ("Type 2 values are NOT Equal") End If End Sub
帕伦/报价回应
As a side note, I will need to eliminate the quotes from the nicknames and the parentheses from the preferred names.
如果我理解正确
是的,您可以单独地在引号和括号内捕捉内容。
这只是需要一些修改。 下面的正则expression式有能力
用或不用引号和/或括号制定替代scheme,
或其他forms。
下面的样品给出了制定替代品的方法。
非常重要请注意这里
如果你正在讨论去除引号“”和括号()
匹配正则expression式,这也可以做到。 它需要一个新的正则expression式。
唯一的问题是所有区分首选/中间/尼克
被扔出窗外,因为这些都是位置以及
(即:(首选)中间“尼克”)。
取消这个考虑将需要像这样的正则expression式
(?:[ ]([^ ,]+))? # optional preferred (?:[ ]([^ ,]+))? # optional middle (?:[ ]([^ ,]+))? # optional nick
而且,他们是可选的,失去了所有的位置参考,并呈现中期的初始
expression无效。
结束注释
正则expression式模板(用于制定replacestring)
^ [ ]? # (required) # First # $1 name # ----------------------------------------- ([^\ ,()"']+) # (1) name # (optional) # Parenthetical 'preferred' # $2 all # $3$4 name # ----------------------------------------- (?: ( # (2) all ([ ]) \( ([^)]*) \) # (3,4) space and name ) )? # (optional) # Middle # $5 initial # $5$6 name # ----------------------------------------- (?: ([ ] [^\ ,()"'] ) # (5) first character ([^\ ,()"']*) # (6) remaining characters )? # (optional) # Quoted nick # $7$8$9$8 all # $7$9 name # ----------------------------------------- (?: ([ ]) # (7) space (["']) # (8) quote (.*?) # (9) name \8 )? # (required) # Last # $10 name # ----------------------------------------- [ ] ([^\ ,()"']+) # (10) name # (optional) # Suffix # $11 suffix # ----------------------------------------- (?: [, ]* ([ ].+?) [ ]? # (11) suffix | .*? )? $
VBA正则expression式(第二版,从上面的VBA项目中testing)
rxC1.Pattern = "^[ ]?([^ ,()""']+)(?:(([ ])\(([^)]*)\)))?(?:([ ][^ ,()""'])([^ ,()""']*))?(?:([ ])([""'])(.*?)\8)?[ ]([^ ,()""']+)(?:[, ]*([ ].+?)[ ]?|.*?)?$" strC1_1a = rxC1.Replace( strC1_Normal, "$10$11,$1$2$5$6$7$8$9$8" ) strC1_1aa = rxC1.Replace( strC1_Normal, "$10$11,$1$3$4$5$6$7$9" ) strC1_1b = rxC1.Replace( strC1_Normal, "$10$11,$1$2$5$7$8$9$8" ) strC1_1bb = rxC1.Replace( strC1_Normal, "$10$11,$1$3$4$5$7$9" ) strC1_2 = rxC1.Replace( strC1_Normal, "$10$11,$1$5" )
示例input/输出可能性
Input (raw) = 'John (Johnny) Bertrand "Abe" Smith, Jr. ' Out type 1a full middle = 'Smith Jr,John (Johnny) Bertrand "Abe"' Out type 1aa full middle = 'Smith Jr,John Johnny Bertrand Abe' Out type 1b middle initial = 'Smith Jr,John (Johnny) B "Abe"' Out type 1bb middle initial = 'Smith Jr,John Johnny B Abe' Out type 2 middle initial = 'Smith Jr,John B' Input (raw) = 'John (Johnny) Smith, Jr.' Out type 1a full middle = 'Smith Jr,John (Johnny)' Out type 1aa full middle = 'Smith Jr,John Johnny' Out type 1b middle initial = 'Smith Jr,John (Johnny)' Out type 1bb middle initial = 'Smith Jr,John Johnny' Out type 2 middle initial = 'Smith Jr,John' Input (raw) = 'John (Johnny) "Abe" Smith, Jr.' Out type 1a full middle = 'Smith Jr,John (Johnny) "Abe"' Out type 1aa full middle = 'Smith Jr,John Johnny Abe' Out type 1b middle initial = 'Smith Jr,John (Johnny) "Abe"' Out type 1bb middle initial = 'Smith Jr,John Johnny Abe' Out type 2 middle initial = 'Smith Jr,John' Input (raw) = 'John "Abe" Smith, Jr.' Out type 1a full middle = 'Smith Jr,John "Abe"' Out type 1aa full middle = 'Smith Jr,John Abe' Out type 1b middle initial = 'Smith Jr,John "Abe"' Out type 1bb middle initial = 'Smith Jr,John Abe' Out type 2 middle initial = 'Smith Jr,John'
回复:4/17关注
last names that have 2 or more words. Would the allowance for certain literal names, rather than generic word patterns, be the solution?
其实不,不会。 在这种情况下,对于你的表单,允许多个单词的姓氏
将空间字段分隔符注入到姓氏字段中。
然而,对于你的特定forms,这是可以做到的,因为唯一的障碍就是当时的情况
"nick"
字段丢失。 当它缺less,并给予只有一个词在中
中间名,列出2个排列。
希望您可以从下面的3个正则expression式和testing用例输出中获得解决scheme。 正则expression式已经从捕获中删除了空格分隔符。 所以,你可以写作
Replace方法的replace,或者只是存储捕获缓冲区进行比较
其他列的捕获scheme的结果。
Nick_rx.Pattern (template) * This pattern is multi-word last name, NICK is required ^ [ ]? # First (req'd) ([^\ ,()"']+) # (1) first name # Preferred first (?: [ ] ( # (2) (preferred), -or- \( ([^)]*?) \) # (3) preferred ) )? # Middle (?: [ ] ( # (4) full middle, -or- ([^\ ,()"']) # (5) initial [^\ ,()"']* ) )? # Quoted nick (req'd) [ ] ( # (6) "nick", (["']) # (7) n/a -or- (.*?) # (8) nick \7 ) # Single/Multi Last (req'd) [ ] ( # (9) multi/single word last name [^\ ,()"']+ (?:[ ][^\ ,()"']+)* ) # Suffix (?: [ ]? , [ ]? (.*?) )? # (10) suffix [ ]? $ ----------------------------------- FLs_rx.Pattern (template) * This pattern has no MIDDLE/NICK, is single-word last name, * and has no permutations. ^ [ ]? # First (req'd) ([^\ ,()"']+) # (1) first name # Preferred first (?: [ ] ( # (2) (preferred), -or- \( ([^)]*?) \) # (3) preferred ) )? # Single Last (req'd) [ ] ([^\ ,()"']+) # (4) single word last name # Suffix (?: [ ]? , [ ]? (.*?) )? # (5) suffix [ ]? $ ----------------------------------- FLm_rx.Pattern (template) * This pattern has no NICK, is multi-word last name, * and has 2 permutations. * 1. Middle as part of Last name. * 2. Middle is separate from Last name. ^ [ ]? # First (req'd) ([^\ ,()"']+) # (1) first name # Preferred first (?: [ ] ( # (2) (preferred), -or- \( ([^)]*?) \) # (3) preferred ) )? # Multi Last (req'd) [ ] ( # (4) Multi, as Middle + Last, # -or- (?: # Middle ( # (5) full middle, -or- ([^\ ,()"']) # (6) initial [^\ ,()"']* ) [ ] ) # Last (req'd) ( # (7) multi/single word last name [^\ ,()"']+ (?:[ ][^\ ,()"']+)* ) ) # Suffix (?: [ ]? , [ ]? (.*?) )? # (8) suffix [ ]? $ ----------------------------------- Each of these regexes are mutually exclusive and should be checked in an if-then-else like this (Pseudo code): str_Normal = rxDot.Replace(rxWSp.Replace(str, " "), "") If Nick_rx.Test(str_Normal) Then N_1a = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $2 $4 $6 "), " ") N_1aa = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $3 $4 $8 "), " ") N_1b = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $2 $5 $6 "), " ") N_1bb = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $3 $5 $8 "), " ") N_2 = rxWSp.Replace( Nick_rx.Replace(str_Normal, "$9 $10 , $1 $5 "), " ") ' see test case results in output below Else If FLs_rx.Test(str_Normal) Then FLs_1a = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 $2 "), " ") FLs_1aa = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 $3 "), " ") FLs_2 = rxWSp.Replace( FLs_rx.Replace(str_Normal, "$4 $5 , $1 "), " ") Else If FLm_rx.Test(str_Normal) Then ' Permutation 1: FLm1_1a = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 $2 "), " ") FLm1_1aa = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 $3 "), " ") FLm1_2 = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$4 $8 , $1 "), " ") ' Permutation 2: FLm2_1a = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $2 $5 "), " ") FLm2_1aa = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $3 $5 "), " ") FLm2_1b = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $2 $6 "), " ") FLm2_1bb = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $3 $6 "), " ") FLm2_2 = rxWSp.Replace( FLm_rx.Replace(str_Normal, "$7 $8 , $1 $6 "), " ") ' At this point, the odds are that only one of these permutations will match ' a different column. Else ' The data could not be matched against a valid form End If ----------------------------- Test Cases Found form 'Nick' Input (raw) = 'John1 (JJ) Bert "nick" St Van Helsing ,Jr ' Normal = 'John1 (JJ) Bert "nick" St Van Helsing ,Jr ' Out type 1a full middle = 'St Van Helsing Jr , John1 (JJ) Bert "nick" ' Out type 1aa full middle = 'St Van Helsing Jr , John1 JJ Bert nick ' Out type 1b middle initial = 'St Van Helsing Jr , John1 (JJ) B "nick" ' Out type 1bb middle initial = 'St Van Helsing Jr , John1 JJ B nick ' Out type 2 middle initial = 'St Van Helsing Jr , John1 B ' ======================================================= Found form 'Nick' Input (raw) = 'John2 Bert "nick" Helsing ,Jr ' Normal = 'John2 Bert "nick" Helsing ,Jr ' Out type 1a full middle = 'Helsing Jr , John2 Bert "nick" ' Out type 1aa full middle = 'Helsing Jr , John2 Bert nick ' Out type 1b middle initial = 'Helsing Jr , John2 B "nick" ' Out type 1bb middle initial = 'Helsing Jr , John2 B nick ' Out type 2 middle initial = 'Helsing Jr , John2 B ' ======================================================= Found form 'Nick' Input (raw) = 'John3 Bert "nick" St Van Helsing ,Jr ' Normal = 'John3 Bert "nick" St Van Helsing ,Jr ' Out type 1a full middle = 'St Van Helsing Jr , John3 Bert "nick" ' Out type 1aa full middle = 'St Van Helsing Jr , John3 Bert nick ' Out type 1b middle initial = 'St Van Helsing Jr , John3 B "nick" ' Out type 1bb middle initial = 'St Van Helsing Jr , John3 B nick ' Out type 2 middle initial = 'St Van Helsing Jr , John3 B ' ======================================================= Found form 'First-Last (single)' Input (raw) = 'John4 Helsing ' Normal = 'John4 Helsing ' Out type 1a no middle = 'Helsing , John4 ' Out type 1aa no middle = 'Helsing , John4 ' Out type 2 = 'Helsing , John4 ' ======================================================= Found form 'First-Last (single)' Input (raw) = 'John5 (JJ) Helsing ' Normal = 'John5 (JJ) Helsing ' Out type 1a no middle = 'Helsing , John5 (JJ) ' Out type 1aa no middle = 'Helsing , John5 JJ ' Out type 2 = 'Helsing , John5 ' ======================================================= Found form 'First-Last (multi)' Input (raw) = 'John6 (JJ) Bert St Van Helsing ,Jr ' Normal = 'John6 (JJ) Bert St Van Helsing ,Jr ' Permutation 1: Out type 1a no middle = 'Bert St Van Helsing Jr , John6 (JJ) ' Out type 1aa no middle = 'Bert St Van Helsing Jr , John6 JJ ' Out type 2 = 'Bert St Van Helsing Jr , John6 ' Permutation 2: Out type 1a full middle = 'St Van Helsing Jr , John6 (JJ) Bert ' Out type 1aa full middle = 'St Van Helsing Jr , John6 JJ Bert ' Out type 1b middle initial = 'St Van Helsing Jr , John6 (JJ) B ' Out type 1bb middle initial = 'St Van Helsing Jr , John6 JJ B ' Out type 2 middle initial = 'St Van Helsing Jr , John6 B ' ======================================================= Found form 'First-Last (multi)' Input (raw) = 'John7 Bert St Van Helsing ,Jr ' Normal = 'John7 Bert St Van Helsing ,Jr ' Permutation 1: Out type 1a no middle = 'Bert St Van Helsing Jr , John7 ' Out type 1aa no middle = 'Bert St Van Helsing Jr , John7 ' Out type 2 = 'Bert St Van Helsing Jr , John7 ' Permutation 2: Out type 1a full middle = 'St Van Helsing Jr , John7 Bert ' Out type 1aa full middle = 'St Van Helsing Jr , John7 Bert ' Out type 1b middle initial = 'St Van Helsing Jr , John7 B ' Out type 1bb middle initial = 'St Van Helsing Jr , John7 B ' Out type 2 middle initial = 'St Van Helsing Jr , John7 B ' ======================================================= Form *** (unknown) Input (raw) = ' do(e)s not. match ,' Normal = ' do(e)s not match ,' =======================================================
这是一个可能有用的正则expression式,这将给你6个捕获组,按照以下顺序:名字,优先名称,中间名,昵称,姓氏,后缀。
([az]+)\.?\s(?:(\([az]+\))\s)?(?:([az]+)\.?\s)?(?:("[az]+")\s)?([az]+)(?:,\s([az]+))?
这里是一个解释:
([az]+)\.?\s # First name, followed by optional '.' (required) (?:(\([az]+\))\s)? # Preferred name, optional (?:([az]+)\.?\s)? # Middle name, optional (?:("[az]+")\s)? # Nickname, optional ([az]+) # Last name, required (?:,\s([az]+))? # Suffix, optional
例如,你可以把John (Johnny) B. "Abe" Smith, Jr.
变成Smith Jr,John (Johnny) B "Abe"
\5 \6,\1 \2 \3 \4
或者你可以用\5 \6,\1 \3
把它变成Smith Jr,John B