如何提高VBA中XMLparsing的速度?
我有一个大的XML文件,需要在VBA中parsing(Excel 2003&2007)。 xml文件中可能有11,000行“数据行”,每个“行”具有10到20列之间的数据。 这是一个巨大的任务,只是parsing和抓取数据(5 – 7分钟)。 我尝试读取XML并将每个“行”放入字典(键=行号,值=行属性),但这需要很长的时间。
遍历DOM是永恒的。 有没有更有效的方法?
Dim XMLDict Sub ParseXML(ByRef RootNode As IXMLDOMNode) Dim Counter As Long Dim RowList As IXMLDOMNodeList Dim ColumnList As IXMLDOMNodeList Dim RowNode As IXMLDOMNode Dim ColumnNode As IXMLDOMNode Counter = 1 Set RowList = RootNode.SelectNodes("Row") For Each RowNode In RowList Set ColumnList = RowNode.SelectNodes("Col") Dim NodeValues As String For Each ColumnNode In ColumnList NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text Next ColumnNode XMLDICT.Add Counter, NodeValues Counter = Counter + 1 Next RowNode End Sub
你可以尝试使用SAX而不是DOM。 当你所做的只是parsing文件而文件大小不重要时,SAX应该更快。 这里是MSXML中SAX2实现的参考
我通常在Excel中直接获取大多数XMLparsing的DOM,但在某些情况下,SAX似乎具有优势。 这里的简短比较可能有助于解释它们之间的差异。
下面是一个黑客入侵的例子(部分基于这个 ),使用Debug.Print
输出:
通过工具>参考添加对“Microsoft XML,v6.0”的引用
在普通模块中添加此代码
Option Explicit Sub main() Dim saxReader As SAXXMLReader60 Dim saxhandler As ContentHandlerImpl Set saxReader = New SAXXMLReader60 Set saxhandler = New ContentHandlerImpl Set saxReader.contentHandler = saxhandler saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml" Set saxReader = Nothing End Sub
添加一个类模块,将其称为ContentHandlerImpl
并添加下面的代码
Option Explicit Implements IVBSAXContentHandler Private lCounter As Long Private sNodeValues As String Private bGetChars As Boolean
使用模块顶部的左侧下拉菜单select“IVBSAXContentHandler”,然后使用右侧下拉菜单依次为每个事件添加存根(从characters
到startPrefixMapping
)
将代码添加到一些存根,如下所示
显式地设置计数器和标志来显示我们是否要在这个时候读取文本数据
Private Sub IVBSAXContentHandler_startDocument() lCounter = 0 bGetChars = False End Sub
每次启动新元素时,请检查元素的名称并采取适当的措施
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) Select Case strLocalName Case "Row" sNodeValues = "" Case "Col" sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":" bGetChars = True Case Else ' do nothing End Select End Sub
检查我们是否对文本数据感兴趣,如果是的话,删除任何无关的空格,并删除所有换行符(根据您要parsing的文档,这可能会也可能不需要)
Private Sub IVBSAXContentHandler_characters(strChars As String) If (bGetChars) Then sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "") End If End Sub
如果我们已经到达Col
最后,停止阅读文本值; 如果我们已经达到了Row
的末尾,那么打印出节点值的string
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String) Select Case strLocalName Case "Col" bGetChars = False Case "Row" lCounter = lCounter + 1 Debug.Print lCounter & " " & sNodeValues Case Else ' do nothing End Select End Sub
为了使事情更清楚,下面是完整版本的ContentHandlerImpl
,其中包含存根方法的al:
Option Explicit Implements IVBSAXContentHandler Private lCounter As Long Private sNodeValues As String Private bGetChars As Boolean Private Sub IVBSAXContentHandler_characters(strChars As String) If (bGetChars) Then sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "") End If End Sub Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator) End Property Private Sub IVBSAXContentHandler_endDocument() End Sub Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String) Select Case strLocalName Case "Col" bGetChars = False Case "Row" lCounter = lCounter + 1 Debug.Print lCounter & " " & sNodeValues Case Else ' do nothing End Select End Sub Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String) End Sub Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String) End Sub Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String) End Sub Private Sub IVBSAXContentHandler_skippedEntity(strName As String) End Sub Private Sub IVBSAXContentHandler_startDocument() lCounter = 0 bGetChars = False End Sub Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes) Select Case strLocalName Case "Row" sNodeValues = "" Case "Col" sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":" bGetChars = True Case Else ' do nothing End Select End Sub Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String) End Sub
使用SelectSingleNode
函数。 这将使您可以根据模式匹配search节点。
例如,我创build了以下function:
Private Function getXMLNodeValue(ByRef xmlDoc As MSXML2.DOMDocument, ByVal xmlPath As String) Dim node As IXMLDOMNode Set node = xmlDoc.SelectSingleNode(xmlPath) If node Is Nothing Then getXMLNodeValue = vbNullString Else getXMLNodeValue = node.Text End Function
现在,如果我有以下XML文件:
我可以简单地调用:
myValue = getXMLNodeValue(xmlResult, "//ErrorStatus/Source")
它会跳到第一个任意深度的“错误状态”键,并拉出“源”节点中的文本 – 返回“INTEGRATION”