根据条件parsingExcel单元数据

我有一种情况,其中我需要parsing包含多行文本的单元格的数据与预期的期望结果。 我相信我们可以通过使用regx创build这样的parsing,但是我太笨了,无法弄清楚我的代码出了什么问题。

Sub AddDetails(c As Range) Dim x As Variant Dim y As Variant Dim a() As Variant Dim r As Long Dim i As Long Dim j As Long ActiveSheet.Cells(1, col + 1).Value = "Student Name" ActiveSheet.Cells(1, col + 2).Value = "ART" ActiveSheet.Cells(1, col + 3).Value = "Non-Final Result" ActiveSheet.Cells(1, col + 4).Value = "Final Result" For r = 2 To Cells(rowS.Count, 1).End(xlUp).Row y = "Student Name=" & SplitMe(Range(col & r).Value, "Student Name")(1) x = Split(y, vbLf) For i = LBound(x) To UBound(x) If InStr(x(i), "=") Then ReDim Preserve a(j) a(UBound(a)) = Split(x(i), "=")(1) j = j + 1 End If Next i Range("C" & r).Resize(, UBound(a) + 1).Value = a Erase x: Erase a: j = 0 Next r End Sub Function FindColumn(searchFor As String) As Integer Dim i As Integer 'Search row 1 for searchFor FindColumn = 0 For i = 1 To ActiveSheet.Cells.SpecialCells(xlCellTypeLastCell).Column If ActiveSheet.Cells(1, i).Value = searchFor Then FindColumn = i Exit For End If Next i End Function Function SplitMe(s As String, delimiter As String) Dim arr, i As Long If Len(s) = 0 Or Len(delimiter) = 0 Then SplitByLastOccurrence = CVErr(2001) Else i = InStrRev(s, delimiter) If i = 0 Then SplitByLastOccurrence = Array(s) Else ReDim arr(0 To 1) arr(0) = Trim(Left$(s, i - 1)) arr(1) = Trim(Mid$(s, i + Len(delimiter) + 1)) SplitByLastOccurrence = arr End If End If End Function 

我试图实现的是从单元格parsing数据,然后关键词学生名称和艺术

最重要的是我有一些date可能只有一个或多个目前的关键行'非最终结果'如果事件date之后出现在这个短语,需要在各自的列给出,如果其多个然后需要堆栈他们在同一列的单元格。

更新date解释问题

 Event date=2016-09-02 Event code=UU/CZXCD Event type=Examination events Event type=AS Event type=ASED Non-Final Result Event date=2017-08-05 Event code=UU/CZXSA Event type=Examination events Event type=AS Event type=ASED Final Result Event date=2017-09-08 Event code=UU/CZXCD Event type=Examination events Event type=AS Event type=ASED Non-Final Result 

这是使用正则expression式来提取数据的每个部分的一种方法。 它再现了你在样本表中的内容

  • 你所有的数据被假定为从A2开始的A列
  • 假定要提取的不同分段的模式与您所显示的完全相同。 如果它们不是,那么写成的正则expression式将不起作用。
  • 我使用一个Dictionary对象来存储结果。 使我把事情放在一起更容易,以便在完成时将结果写入工作表
  • 我用早期的绑定(设置那些引用); 但是如果需要,你可以重写使用后期绑定

编辑 :代码已被编辑,以说明确定Finalnon-Final结果date以及RCE的不同方法。

编辑2:根据海报请求编辑的某些术语

我留下了相同的逻辑来确定NameART (除了将学生姓名更改为xxxxx。

我用来返回date的逻辑是:

  • 查找以Event Publication Date=开头的行
  • 提取该行末尾的date
    • 当且仅当后面跟着适当的文本string( Final ResultNon-Final Resultxxxxx ),其间没有date。

 Option Explicit 'set reference to Microsoft Scripting Runtime ' Microsoft VBScript Regular Expressions 5.5 Private RE As RegExp Private MC As MatchCollection Sub StudentDetail() Dim dS As Dictionary Dim WS As Worksheet Dim vSrc As Variant, vRes As Variant, rRes As Range Dim V As Variant, I As Long, J As Long Dim S As String 'Read data into vSrc Set WS = Worksheets("sheet1") With WS vSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)) End With 'First cell of results array Set rRes = WS.Cells(1, 2) 'Initialize Regex Set RE = New RegExp With RE .Global = True .MultiLine = True .IgnoreCase = True End With 'Collect the data into a dictionary Set dS = New Dictionary dS.CompareMode = TextCompare For I = 2 To UBound(vSrc, 1) ReDim V(4) S = vSrc(I, 1) 'Name V(0) = reExtract(S, "Primary xxxxx:\s+(.*)") 'ART V(1) = reExtract(S, "ART=(.*)") 'NonFinal V(2) = reExtract(S, "Event publication date=(\d{4}-\d{2}-\d{2})(?:(?!\d{4}-\d{2}-\d{2})[\s\S])*Non-Final") 'Final V(3) = reExtract(S, "Event publication date=(\d{4}-\d{2}-\d{2})(?:(?!\d{4}-\d{2}-\d{2})[\s\S])*^Final Rejection") 'RCE V(4) = reExtract(S, "Event publication date=(\d{4}-\d{2}-\d{2})(?:(?!\d{4}-\d{2}-\d{2})[\s\S])*xxxxxx") If Not dS.Exists(V(0)) Then dS.Add Key:=V(0), Item:=V Else MsgBox "duplicate name" Stop 'You need to decide what to do End If Next I 'Output the results to array ReDim vRes(0 To dS.Count, 1 To 5) vRes(0, 1) = "xxxxx" vRes(0, 2) = "ART" vRes(0, 3) = "Non-Final Result" vRes(0, 4) = "Final Result" vRes(0, 5) = "RCE" For I = 0 To dS.Count - 1 V = dS(dS.Keys(I)) For J = 0 To 4 If IsArray(V(J)) Then vRes(I + 1, J + 1) = Join(V(J), vbLf) Else vRes(I + 1, J + 1) = V(J) End If Next J Next I 'write array to worksheet Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2)) With rRes .EntireColumn.Clear .Value = vRes With .Rows(1) .Font.Bold = True .HorizontalAlignment = xlCenter End With .EntireColumn.AutoFit End With End Sub Function reExtract(S As String, sPat As String) Dim V As Variant Dim I As Long With RE .Pattern = sPat If .Test(S) = True Then Set MC = .Execute(S) With MC If .Count = 1 Then reExtract = MC(0).SubMatches(0) Else ReDim V(0 To .Count - 1) I = 0 For I = 0 To .Count - 1 V(I) = MC(I).SubMatches(0) Next I reExtract = V End If End With End If End With End Function 

那么,您RegEx VBA的一个关键部分不能正常工作,可能是因为您没有包含任何与RegEx相关的代码。 :)但是,我看到你试图把一些代码片段放在一个工作的例子,所以你得到努力点!

看看你的数据样本,我不禁要问,这个数据是从哪里来的,以及是否有更好的方法来parsing它,而不是使用VBA手动进行分析。 例如,如果这是从数据库或网站链接或导入的代码,可能是大量的,也许Excel的内置“ 获取外部数据 ”function将更适合,或者您应该倾向于访问而不是。

尽pipe如此,我还是很好奇为什么需要parsing您的规范,并且我提出了一个可行的解决scheme,至less在您提供的单个示例中(没有RegEx)。 您的样本的更新副本上传到这里 (虽然我不确定它是否会正确下载,或如果VBA将被阻止 )。

 Option Explicit 'These functions can be used two ways: ' 1. Call sub [populateStudentData] and then programmatically put the extracted data where it needs to go (like sub "sTest" does below) ' 2. Call function [studentData] in a worksheet cell to return the desired field ' - Syntax: =studentData ( rawData_In, fieldName_Out ) ' - example: =studentData ( A2, "Final Result" ) : returns all the "Final Result" dates found in cell A2 ' * NOTE that WRAP TEXT in "Cell Format > Alignment" must be on for multi lines (via vbLf) to display properly with this method ' WRAP TEXT can also be set programmatically with cell.WrapText (see: https://stackoverflow.com/a/9902628/8112776) ' - less efficient since it parses all the data for each cell & each field, on every calculation, ' but shouldn't be a problem unless the function is used in a LOT of cells (in which case it start getting slow to re-calculate) 'Slapped together by ashleedawg@outlook.com for 'SO' Question: https://stackoverflow.com/questions/46996095/parsing-excel-single-cell-data-based-on-condition 'This code contain *zero error checking* and limited documentation. Google any terms with which you are unfamiliar. 'Created with limited information on the actual application, this is FAR-from the most efficient way to parse data! "Provided for educational purposes only!" 'Perhaps using Collections or Objects would be more efficient. Please feel free to improve & re-post this code, comments, suggestions, etc. 'declare an array to temporarily store: ' "sField" = text before the delimiter, in: "arrStudentData(__,0)" ' "sValue" = text after the delimiter (if there is one), in: "arrStudentData(__,1)" (Multiple values will be concatenated, delimited with vbLf's) ' "sDelimiter" = character that separates the sField from the SValue: either a Colon or an Equal Sign (: or =) Public arrStudentData(1 To 99, 0 To 1) As String Public Function studentData(rawData_In As String, fieldName_Out As String) As String 'parse [rawData_In] and return concatenated string of "sValue" data for "sField" specified in [fieldName_out] 'the concatenated output of the function can be SPLIT (or values Text-To-Column'd) based on [newDelimiter] (vbLf by default] 'call this function on a workdheet or programmatically populateStudentData (rawData_In) studentData = getField(fieldName_Out) End Function Sub sTest() 'as a test/debugging, let's parse data from cell A2 populateStudentData (Range("$A$2").Value) Stop 'hit F5 or click "play" to print results to the Immediate Window (Hit CTRL+G here to view) Call print_Results Stop 'hit F5 or click "play" to print data for field "Final Result" to Immediate Window Debug.Print getField("Final Result") Stop 'hit F5 or click "play" to populate cell A3 with data for field "Final Result" Range("$A$3") = getField("Final Result") End Sub Sub populateStudentData(str_Input As String) 'populate array [arrStudentData] by parsing [str_Input] (the raw string we need to decode) Dim new_Delimiter new_Delimiter = Chr(10) Dim arr_Input() As String '[str_Input] split into an array & cleaned up Dim sFieldCount As Integer 'the number of "sField"'s found in [str_Input] Dim sLineNumber As Integer 'the input "line" we are processor (counter) Dim sFieldExists As Boolean 'TRUE if the "sField" has already been found at least once Dim x As Integer 'counter 'start with an empty array Erase arrStudentData 'remove "Event Date" to combine date with previous line for {"Final Result" or "Non-Final Result"} str_Input = Replace(str_Input, "Result" & vbLf & "Event Date", "Result", , , vbTextCompare) '(vbTextCompare makes the search non-case-sensitive) 'split [str_Input] into array [arr_Input] with vbLf's separating each value arr_Input = Split(str_Input, vbLf) 'enumerate [arr_Input] to create a list of "sField's" in arrStudentData(x, 0) sFieldCount = 0 For sLineNumber = 0 To UBound(arr_Input) If extract_sValue(arr_Input(sLineNumber)) <> "" Then 'ignore lines that don't have an "sValue" 'does this field already exist? sFieldExists = False 'enumerate [arrStudentData(x, 0)] to see if this field already exists For x = 1 To sFieldCount If arrStudentData(x, 0) = extract_sField(arr_Input(sLineNumber)) Then sFieldExists = True 'field already exists in list Next x If Not sFieldExists Then 'field doesn't exist, add it to list sFieldCount = sFieldCount + 1 arrStudentData(sFieldCount, 0) = extract_sField(arr_Input(sLineNumber)) End If End If Next sLineNumber 'we now have an array of field names : arrStudentData(1 to [sFieldCount],0) 'next, enumerate [arr_Input] again, this time putting the "sValue's" into arrStudentData(x, 1) For sLineNumber = 0 To UBound(arr_Input) For x = 1 To sFieldCount 'add the VALUE to the arrStudentData(x, 1) If extract_sField(arr_Input(sLineNumber)) = arrStudentData(x, 0) Then 'this field is arrStudentData(x, 0) so concatenate the value after the "sDelimiter" to arrStudentData(x, 1) If Len(arrStudentData(x, 1)) > 0 Then 'this isn't the first value so add [new_Delimiter] before "sValue" (default: vbLf) arrStudentData(x, 1) = arrStudentData(x, 1) & new_Delimiter End If arrStudentData(x, 1) = arrStudentData(x, 1) & extract_sValue(arr_Input(sLineNumber)) End If Next x Next sLineNumber End Sub Function getField(sField As String) As String 'return "sValue" for the specified "sField" Dim x As Integer 'counter 'enumerate the array to find a match For x = LBound(arrStudentData) To UBound(arrStudentData) If LCase(arrStudentData(x, 0)) = LCase(sField) Then 'compare lowercase (so not case sensitive) 'found a match getField = arrStudentData(x, 1) Exit Function End If Next x End Function Function extract_sField(str_In As String) As String 'return text found BEFORE one of the "sDelimiter's" If str_In <> "" Then extract_sField = Split(Split(str_In, ":")(0), "=")(0) End Function Function extract_sValue(str_In As String) As String 'return text found AFTER one of the "sDelimiter's" If InStr(str_In, "=") > 0 Then extract_sValue = Trim(Split(str_In, "=")(1)) 'text after "sDelimiter" = Else If InStr(str_In, ":") > 0 Then extract_sValue = Trim(Split(str_In, ":")(1)) 'text after "sDelimiter" : Else extract_sValue = "" 'no "sDelimiter's" found so return no value End If End If End Function Sub print_Results() 'for testing/debugging purposes: print values of array [arrStudentData] in the Immediate Window (Hit CTRL+G here to view) Dim x As Integer 'counter Debug.Print "----------" For x = LBound(arrStudentData) To UBound(arrStudentData) If arrStudentData(x, 0) <> "" Then Debug.Print "arrStudentData(" & x & ",0) = """ & arrStudentData(x, 0) & """" Debug.Print "arrStudentData(" & x & ",1) = """ & arrStudentData(x, 1) & """" Debug.Print "----------" End If Next x End Sub 

正如你可能知道的那样,SO不应该是一个“代码写作服务”,但是我把它作为一个练习“挑战”来看看我是否可以一起打一些东西(所以不要给我废话,mods!)没有error handling和有限的评论,但看看,也许你可以适应它的需要。 确实有更有效的方法来解决它,而不是反复枚举每个单元格的同一个数组的方法,所以它不适用于“巨大的”规模。