如何从多个XML中提取值到excel?

我有大量的这种forms的松散XML(有些有不同数量的字段,但都包含我想要的某些信息)

<message> <stdHeader> don't want </stdheader> <formdata> <field1> <subfield1> <type> don't want </type> <name> want </name> </subfield1> <subfield2> want </subfield2> <subfield3> don't want </subfield4> </field1> <field2> don't want </field2> <field3> <subfield1> <givenName> want </givenName> <familyName> want </familyName> </field3> </formdata> <aaaa>don't want </aaaa> <bbbb>don't want</bbbb> <cccc>don't want</cccc> <dddd>don't want</dddd> <eeee>don't want</eeee> <ffff>don't want</ffff> </message> 

我想要一个excel表中的列标题'name','subfield3','givenName','familyName'(从上面),每行都有这些来自每个XML的值。 我是一个编程初学者,所以我不知道如何1.从一个单一的XML提取我想要的值,并2.写一些代码来做1.文件夹中的每个XML文件。 谁能帮我吗?

编辑:

样本实际的XML

  <?xml version="1.0" encoding="UTF-8"?> <message submitted="y" xmlns="u"> <s><m>4</m><me>0</me> <oc>I</oc> <os>E</os><dr>21</dr> <tr>1</tr><dc>20/dc> <tc>1</tc><ds>2</ds> <ts>1</ts></sh><formData><c><identifier edgeitem="ZCO01b"> <type edgeitem="ZCO01b">C</type><value edgeitem="ZCO01b">172</value> </identifier><name edgeitem="ZCO01a">JMTGN</name></c> <pb><ch><of><ef edgeitem="ZRP04b">20</ef><ad edgeitem="ZRG03c"> <adL edgeitem="ZRP04d">2MR1</adL> <co edgeitem="ZRP04d">A</co><ov>true</ov> </ad></of></ch></pb><of><ch edgeitem="ZSD06a"> <of><pe><ne edgeitem="ZSD06c"><gi edgeitem="ZSD06c">k</gi> <fa edgeitem="ZSD06c">o</fa> </ne><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">196</da> <ci edgeitem="ZSD06d">MNE</ci><st edgeitem="ZSD06d">VC</st> <co edgeitem="ZSD06d">Aua</co></bi> </pe><ad edgeitem="ZSD06h"><ad edgeitem="ZSD06h">24IC86</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov><not><daC edgeitem="ZSD06b">29</daC> </not></ad></of><of><pe><name edgeitem="ZSD06c"><gs edgeitem="ZSD06c">jane</gs> <fae edgeitem="ZSD06c">ci</fae></name><bi edgeitem="ZSD06d"><da edgeitem="ZSD06d">198</da><ci edgeitem="ZSD06d">MLB</ci><st edgeitem="ZSD06d">VC</st> <co edgeitem="ZSD06d">Aul</co></bi></pe><ad edgeitem="ZSD06h"><adL edgeitem="ZSD06h">24IC</adL><co edgeitem="ZSD06h">uia</co><ov edgeitem="ZSD06i">true</ov> <not><daC edgeitem="ZSD06b">209</daC> </not></ad></of></ch></of><si><name edgeitem="ZDC00a"><givenNames edgeitem="ZDC00a">John </givenNames> <familyName edgeitem="ZDC00a">Citizen</familyName></name><ca edgeitem="ZDC00b">DI</ca><daS edgeitem="ZDC00c">200</daS><dec edgeitem="ZDC00d">true</dec></si></formData> <mes><asi><ebu><re>746</re> </ebu><asc><doc>181</doc></asc></asi> <cus><edg><re><type>RE</type> <qu>42</qu></re><ac>A08</ac> <tra>60</tra> <seq>1</se><tr>7046</tr> <mailbox>PR</mailbox><mode>PROCESS</mode></edge></customer></messageIdentifier> <asc><lo><ag>442</ag></loy></asc> <asco><re><dod> <dete>true</dete><fe> <lod>258</lod><lod>213</lod> <tot>0.00</tot></fe></dod></re> <prs><m>PRS</m><wa>false</wa><deb>false</deb> <maid>DP2</maid> <re>false</re></pro></asco> <wo><aga><ag>2</ag><agn>ATD</agn><co>LNY</co><pos> <adL>PO60</adL><adL>C3145</adL><co>AUA</co><asd>15055</asd> </pos><pe><te><nr>077</nr> </te></ph><fx><te><nr>057</nr> </te></fx></aga></wa></message> 

这个子文件遍历一个文件夹中的所有文件,如果它find任何XML,它就调用第二个子文件


 Option Explicit 'in code editor: Tools > References > checkbox in Microsoft Scripting Runtime Public Sub ProcessXMLs() Const FOLDER_NAME As String = "C:\Tmp" '<- update this path Dim tags As Variant, hdrs As Variant, rowID As Long Dim fso As FileSystemObject, f As File Set fso = New Scripting.FileSystemObject hdrs = Array("FileName", "ItemID", "Name", "GivenName", "FamilyName") tags = Array("FileName", "value", "name", "givenNames", "familyName") With Sheet1 .Range(.Cells(1, 1), .Cells(1, UBound(tags) + 1)).Value2 = hdrs rowID = 2 Application.ScreenUpdating = False For Each f In fso.GetFolder(FOLDER_NAME).Files 'iterate through files If LCase(fso.GetExtensionName(f)) = "xml" Then .Cells(rowID, 1).Value2 = fso.GetBaseName(f) & ".xml" ReadTags Sheet1, fso.OpenTextFile(f.Path, ForReading), rowID, tags rowID = rowID + 1 End If Next .UsedRange.Columns.AutoFit Application.ScreenUpdating = True End With End Sub 

这个子提取4个标签中的值,当时是一个文件:

 Private Sub ReadTags(ByVal ws As Worksheet, ByVal fsoFile As TextStream, _ ByVal rowID As Long, ByVal tags As Variant) Dim ln As String, val As String, i As Long, s1 As Long, s2 As Long With fsoFile Do While Not .AtEndOfStream 'file stream is open ln = Trim(.ReadLine) 'read each line If Len(ln) > 0 Then 'if text line is not empty extract tags For i = 1 To UBound(tags) 'find each tag - start and closing s1 = InStr(1, ln, "<" & tags(i), 0) s2 = InStr(s1 + 1, ln, "</" & tags(i) & ">", 0) If s1 > 0 And s2 > 0 Then s1 = InStr(s1, ln, """>", 0) + 2 ws.Cells(rowID, i + 1).Value2 = Trim(Mid(ln, s1, s2 - s1)) Exit For End If Next End If Loop End With End Sub 

结果:

在这里输入图像说明