使用VBA通过id获取元素时,忽略某些标记中的元素
我有一个VBA模块提取页面中的所有链接。 然而,我想要忽略某些标签(如<header>
和<footer>
(及其所有子标签))中的所有链接。 谁能告诉我这是怎么做到的?
Sub Fetch_click() Dim LinkArr As Variant Set IE = CreateObject("InternetExplorer.Application") IE.Visible = True IE.Navigate Cells(1, 1).Text While IE.Busy DoEvents Wend Dim i As Integer i = 3 Set LinkArr = IE.Document.getElementsByTagName("a") For Each LinkObj In LinkArr Cells(i, 1).Value = LinkObj.href i = i + 1 Next End Sub
谢谢
我更喜欢使用来自Microsoft HTML对象库和Microsoft Internet控制库 (添加对两个!的引用)的对象,例如
Sub StartTest() Dim Browser As SHDocVw.InternetExplorer Dim HTMLDoc As MSHTML.HTMLDocument ' start browser Set Browser = New SHDocVw.InternetExplorer Browser.Visible = True Browser.navigate "www.dauda.at" Set HTMLDoc = Browser.document Dim ECol As MSHTML.IHTMLElementCollection Dim IFld As MSHTML.IHTMLElement ' search all <a> tags Set ECol = HTMLDoc.getElementsByTagName("a") For Each IFld In ECol ' etc ... Next IFld ' clean up Set IFld = Nothing Set ECol = Nothing Set HTMLDoc = Nothing Browser.Quit Set Browser = Nothing End Sub
检查<a>
标签所在的位置可以像检查IFld.ParentNode.nodeName
一样简单以获取封闭父项的标签。
如果你不清楚你的<a>
是多么深的嵌套,你可以使用recursion函数检查下一个更高的父母一直到文档根目录( "#document"
文档"#document"
)或包含的"HTML"
,例如
Function BadParentRec(TestFld As MSHTML.IHTMLElement) As Boolean Dim MyTag As String, MyTempResult As Boolean BadParentRec = False MyTag = TestFld.ParentNode.nodeName ' Debug.Print MyTag If MyTag = "#document" Then MyTempResult = False ' lowest level is good ElseIf MyTag = "XXX" Then ' your own criteria for bad tags go here MyTempResult = True ' send "bad" back up the recursion chain Else MyTempResult = BadParentRec(TestFld.parentElement) ' next level down End If BadParentRec = MyTempResult End Function
…所以在For Each
循环内,你会说
If Not BadParentRec(IFld) Then Debug.Print Ifld.href ' check here for href = "" End If