使用Excel VBA从网站刮文本?

我很早就使用Excel作为网页刮板,但是我发现这个非常有趣的文章解释了如何使用Excel VBA从网站刮取某些标签。 我有下面的代码工作正常,但它只是从它find的第一个<p>标签的内容:

 Sub get_title_header() Dim wb As Object Dim doc As Object Dim sURL As String Dim lastrow As Long lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row For i = 2 To lastrow Set wb = CreateObject("internetExplorer.Application") sURL = Cells(i, 1) wb.navigate sURL wb.Visible = True While wb.Busy DoEvents Wend 'HTML document Set doc = wb.document Cells(i, 2) = doc.title On Error GoTo err_clear Cells(i, 3) = doc.GetElementsByTagName("p")(0).innerText err_clear: If Err <> 0 Then Err.Clear Resume Next End If wb.Quit Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit Next i End Sub 

现在我想要做的就是调整代码,并让刮板获取网页上<p> tag内的所有内容。 所以我猜想某种foreachfunction缺失。

希望这里有人愿意帮助我扩展代码,以便收集来自多个<p>标签的内容。

更新下面的工作代码!

 Sub get_title_header() Dim wb As Object Dim doc As Object Dim sURL As String Dim lastrow As Long Dim i As Integer lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row For i = 2 To lastrow Set wb = CreateObject("internetExplorer.Application") sURL = Cells(i, 1) wb.navigate sURL wb.Visible = True While wb.Busy DoEvents Wend 'HTML document Set doc = wb.document Cells(i, 2) = doc.Title On Error GoTo err_clear Dim el As Object For Each el In doc.GetElementsByTagName("p") counter = counter + 1 Cells(i, counter + 2).Value = Cells(counter + 1).Value & el.innerText Next el counter = 0 err_clear: If Err <> 0 Then Err.Clear Resume Next End If wb.Quit Range(Cells(i, 1), Cells(i, 10)).Columns.AutoFit Next i End Sub 

你快到了! doc.GetElementsByTagName("p")返回使用doc.GetElementsByTagName("p")(0)访问第一个条目的HTMLParagraphElement对象的集合。 正如你所暗示的,一个For Each循环可以让你依次访问For Each循环:

 Sub get_title_header() Dim wb As Object Dim doc As Object Dim sURL As String Dim lastrow As Long Dim i As Integer lastrow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row For i = 2 To lastrow Set wb = CreateObject("internetExplorer.Application") sURL = Cells(i, 1) wb.navigate sURL wb.Visible = True While wb.Busy DoEvents Wend 'HTML document Set doc = wb.document Cells(i, 2) = doc.Title On Error GoTo err_clear Dim el As Object For Each el In doc.GetElementsByTagName("p") Cells(i, 3).Value = Cells(i, 3).Value & ", " & el.innerText Next el err_clear: If Err <> 0 Then Err.Clear Resume Next End If wb.Quit Range(Cells(i, 1), Cells(i, 3)).Columns.AutoFit Next i End Sub