从列中提取模式
我挣扎着一个巨大的Excel工作表(200K行),在那里我需要从列(B)的string中存在的所有电子邮件地址列表中提取。
我想实现的是:
- 从string中提取电子邮件
- 将
(at)
转换为@
和(dot)
.
- 将名称和电子邮件保存在不同的列中
B列的例子:
Shubhomoy Biswas <biswas_shubhomoy777(at)yahoo(dot)com> Puneet Arora <ar.puneetarora(at)gmail(dot)com> Anand Upadhyay <001.anand(at)gmail(dot)com> Rajat Gupta <rajatgupta0889(at)gmail(dot)com> Sarvesh Sonawane <sarvesh.s(at)suruninfocoresystems.
尽pipe我希望能够在Excel上执行任何其他基于Windows的实用程序build议将会有所帮助。
这可以假设他们都是在相同的格式,每个单元格只有1个电子邮件添加
= SUBSTITUTE(SUBSTITUTE(MID(B1,FIND( “<”,B1)+ 1,LEN(B1) – 查找( “<”,B1)-1), “(在)”, “@”),“(点)”,”。”)
试试这个:
Sub splitter() Dim r As Range, v As String For Each r In Intersect(Range("B:B"), ActiveSheet.UsedRange) v = r.Text If v <> "" Then ary = Split(v, " <") r.Offset(0, 1).Value = ary(0) r.Offset(0, 2).Value = Replace(Replace(Replace(ary(1), ">", ""), "(at)", "@"), "(dot)", ".") End If Next r End Sub
本小节使用列C和D作为输出。 修改代码以满足您的需求。
要提取名称,请尝试= TRIM(LEFT(B1,FIND(“<”,B1)-1))。 user3005775的答案适用于电子邮件。
您也可以轻松地使用正则expression式(您需要添加对Microsoft VBScript正则expression式的引用):
Private Sub ExtractEmailInfo(value As String) Dim expr As New RegExp Dim result As Object Dim user As String Dim addr As String expr.Pattern = "(.+)(<.+>)" Set result = expr.Execute(value) If result.Count > 0 Then user = result(0).SubMatches(0) addr = result(0).SubMatches(1) 'Strip the < and > addr = Mid$(addr, 2, Len(addr) - 2) addr = Replace$(addr, "(at)", "@") addr = Replace$(addr, "(dot)", ".") End If Debug.Print user Debug.Print addr End Sub
将Debug.Print
调用replace为将其放置在单元格中所需的任何操作。
这在15秒内可以达到200 K行:
Option Explicit Sub extractPattern() Dim ws As Worksheet, ur As Range, rng As Range, t As Double Dim fr As Long, fc As Long, lr As Long, lc As Long Set ws = Application.ThisWorkbook.Worksheets("Sheet1") Set ur = ws.UsedRange fr = 1 fc = 1 lr = ws.Cells(ur.Row + ur.Rows.Count + 1, fc).End(xlUp).Row lc = ws.Cells(fr, ur.Column + ur.Columns.Count + 1).End(xlToLeft).Column Set rng = ws.Range(ws.Cells(fr, fc), ws.Cells(lr, fc)) enableXL False t = Timer rng.TextToColumns Destination:=ws.Cells(fr, lc + 1), DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, _ Space:=True With ws.Columns(lc + 3) .Replace What:="(at)", Replacement:="@", LookAt:=xlPart .Replace What:="(dot)", Replacement:=".", LookAt:=xlPart .Replace What:="<", Replacement:=vbNullString, LookAt:=xlPart .Replace What:=">", Replacement:=vbNullString, LookAt:=xlPart End With ws.Range(ws.Cells(fr, lc + 1), ws.Cells(fr, lc + 3)).EntireColumn.AutoFit Debug.Print "Total rows: " & lr & ", Duration: " & Timer - t & " seconds" enableXL 'Total rows: 200,000, Duration: 14.4296875 seconds End Sub Private Sub enableXL(Optional ByVal opt As Boolean = True) Application.ScreenUpdating = opt Application.EnableEvents = opt Application.Calculation = IIf(opt, xlCalculationAutomatic, xlCalculationManual) End Sub
它将新数据放在最后第一个未使用的列中(也将名称分开)