使用VBA从站点刮取innerHTML

我试图声明一个节点的数组(这不是一个问题),然后在数组的每个元素中刮去两个子节点的innerHTML – 以SE为例(使用IE对象方法),假设I试图在主页上提取标题和问题摘要,则有一个节点数组(类名为“ question-summary ”)。

然后有两个子节点(瓦片类名称:“ 问题超链接 ”和摘录 – 类名称:“ 摘录 ”)我使用的代码如下:

 Sub Scraper() Dim ie As Object Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String Set ie = CreateObject("internetexplorer.application") sURL = "https://stackoverflow.com/questions/tagged/excel-formula" QuestionShell = "question-summary" QuestionTitle = "question-hyperlink" Question = "excerpt" With ie .Visible = False .Navigate sURL End With Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4) Set oQuestionShells = doc.getElementsByClassName(QuestionShell) For Each oElement In oQuestionShells Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method" Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method" Debug.Print oQuestionTitle.innerHTML Debug.Print oQuestion.innerHTML Next End Sub 

getElementByClassName不是一个方法。

您只能使用getElementsByClassName (注意方法名称中的复数forms)返回一个IHTMLElementCollection

使用一个Object代替IHTMLElementCollection是好的 – 但是你仍然需要通过给定一个索引来访问集合中的特定元素。

假设对于每个oElement ,只有一个类question-summary实例和一个类question-hyperlink实例question-hyperlink 。 然后,您可以使用getElementsByClassName并在末尾使用(0)来提取返回的数组的第一个元素。

所以你的代码更正是:

 Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0) Set oQuestion = oElement.getElementsByClassName(Question)(0) 

完整的工作代码(几个更新,即使用Option Explicit并等待页面加载):

 Option Explicit Sub Scraper() Dim ie As Object Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String Set ie = CreateObject("internetexplorer.application") sURL = "https://stackoverflow.com/questions/tagged/excel-formula" QuestionShell = "question-summary" QuestionTitle = "question-hyperlink" Question = "excerpt" With ie .Visible = True .Navigate sURL Do DoEvents Loop While .ReadyState < 4 Or .Busy End With Set doc = ie.Document Set oQuestionShells = doc.getElementsByClassName(QuestionShell) For Each oElement In oQuestionShells 'Debug.Print TypeName(oElement) Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0) Set oQuestion = oElement.getElementsByClassName(Question)(0) Debug.Print oQuestionTitle.innerHTML Debug.Print oQuestion.innerHTML Next ie.Quit End Sub