使用VBA通过id获取元素时,忽略某些标记中的元素

我有一个VBA模块提取页面中的所有链接。 然而,我想要忽略某些标签(如<header><footer> (及其所有子标签))中的所有链接。 谁能告诉我这是怎么做到的?

 Sub Fetch_click() Dim LinkArr As Variant Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True IE.Navigate Cells(1, 1).Text While IE.Busy DoEvents Wend Dim i As Integer i = 3 Set LinkArr = IE.Document.getElementsByTagName("a") For Each LinkObj In LinkArr Cells(i, 1).Value = LinkObj.href i = i + 1 Next End Sub 

谢谢

我更喜欢使用来自Microsoft HTML对象库Microsoft Internet控制库 (添加对两个!的引用)的对象,例如

 Sub StartTest() Dim Browser As SHDocVw.InternetExplorer Dim HTMLDoc As MSHTML.HTMLDocument ' start browser Set Browser = New SHDocVw.InternetExplorer Browser.Visible = True Browser.navigate "www.dauda.at" Set HTMLDoc = Browser.document Dim ECol As MSHTML.IHTMLElementCollection Dim IFld As MSHTML.IHTMLElement ' search all <a> tags Set ECol = HTMLDoc.getElementsByTagName("a") For Each IFld In ECol ' etc ... Next IFld ' clean up Set IFld = Nothing Set ECol = Nothing Set HTMLDoc = Nothing Browser.Quit Set Browser = Nothing End Sub 

检查<a>标签所在的位置可以像检查IFld.ParentNode.nodeName一样简单以获取封闭父项的标签。

如果你不清楚你的<a>是多么深的嵌套,你可以使用recursion函数检查下一个更高的父母一直到文档根目录( "#document"文档"#document" )或包含的"HTML" ,例如

 Function BadParentRec(TestFld As MSHTML.IHTMLElement) As Boolean Dim MyTag As String, MyTempResult As Boolean BadParentRec = False MyTag = TestFld.ParentNode.nodeName ' Debug.Print MyTag If MyTag = "#document" Then MyTempResult = False ' lowest level is good ElseIf MyTag = "XXX" Then ' your own criteria for bad tags go here MyTempResult = True ' send "bad" back up the recursion chain Else MyTempResult = BadParentRec(TestFld.parentElement) ' next level down End If BadParentRec = MyTempResult End Function 

…所以在For Each循环内,你会说

  If Not BadParentRec(IFld) Then Debug.Print Ifld.href ' check here for href = "" End If