如何parsingHTML，而无需在vba中创buildInternet Explorer的对象？

我没有在任何工作的计算机上的Internet Explorer，因此创buildInternet Explorer的对象，并使用ie.navigateparsingHTML和search标签是不可能的。我的问题是， 我怎样才能使用标签自动从框架源到我的电子表格不使用IE浏览器的某些数据？ 答案中的代码示例将非常有用:)谢谢

您可以使用XMLHTTP来检索网页的HTML源代码：

Function GetHTML(url As String) As String With CreateObject("MSXML2.XMLHTTP") .Open "GET", url, False .Send GetHTML = .ResponseText End With End Function

我不会build议将其用作工作表函数，否则每次重新计算工作表时都会重新查询站点URL。一些网站有逻辑检测通过频繁，重复的呼叫，并且您的IP可能被禁止，暂时或永久取决于站点的抓取。

一旦你有源HTMLstring（最好存储在一个variables，以避免不必要的重复调用），您可以使用基本的文本函数parsingstring来search您的标记。

这个基本的函数将返回<tag>和</tag>之间的值：

 Public Function getTag(url As String, tag As String, Optional occurNum As Integer) As String Dim html As String, pStart As Long, pEnd As Long, o As Integer html = GetHTML(url) 'remove <> if they exist so we can add our own If Left(tag, 1) = "<" And Right(tag, 1) = ">" Then tag = Left(Right(tag, Len(tag) - 1), Len(Right(tag, Len(tag) - 1)) - 1) End If ' default to Occurrence #1 If occurNum = 0 Then occurNum = 1 pEnd = 1 For o = 1 To occurNum ' find start <tag> beginning at 1 (or after previous Occurence) pStart = InStr(pEnd, html, "<" & tag & ">", vbTextCompare) If pStart = 0 Then getTag = "{Not Found}" Exit Function End If pStart = pStart + Len("<" & tag & ">") ' find first end </tag> after start <tag> pEnd = InStr(pStart, html, "</" & tag & ">", vbTextCompare) Next o 'return string between start <tag> & end </tag> getTag = Mid(html, pStart, pEnd - pStart) End Function

这只会find基本的<tag> ，但您可以添加/删除/更改文本function，以满足您的需求。

用法示例：

 Sub findTagExample() Const testURL = "https://en.wikipedia.org/wiki/Web_scraping" 'search for 2nd occurence of tag: <h2> which is "Contents" : Debug.Print getTag(testURL, "<h2>", 2) '...this returns the 8th occurence, "Navigation Menu" : Debug.Print getTag(testURL, "<h2>", 8) '...and this returns an HTML <span> containing a title for the 'Legal Issues' section: Debug.Print getTag("https://en.wikipedia.org/wiki/Web_scraping", "<h2>", 4) End Sub

任何做过networking抓取的人都会熟悉创buildInternet Explorer（IE）实例并导航到Web地址，然后在页面准备就绪后，使用“Microsoft HTML Object Library”（MSHTML）types开始导航DOM图书馆。问题是否IE不可用做什么。我在运行Windows 10的情况下处于相同的情况。

我曾怀疑有可能创build一个独立于IE的MSHTML.HTMLDocument实例，但其创build并不明显。感谢提问者提出这个问题。答案在于MSHTML.IHTMLDocument4.createDocumentFromUrl方法。一个需要一个本地文件工作（编辑：实际上可以把一个webbyurl以及！），但我们有一个很好的Windows API函数称为URLDownloadToFile下载文件。

此代码在运行Microsoft Edge的Windows 10框上运行，而不是Internet Explorer。这是一个重要的发现，并感谢提问者。

 Option Explicit '* Tools->Refernces Microsoft HTML Object Library '* MSDN - URLDownloadToFile function - https://msdn.microsoft.com/en-us/library/ms775123(v=vs.85).aspx Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub Test() Dim fso As Object Set fso = CreateObject("Scripting.FileSystemObject") Dim sLocalFilename As String sLocalFilename = Environ$("TMP") & "\urlmon.html" Dim sURL As String sURL = "https://stackoverflow.com/users/3607273/s-meaden" Dim bOk As Boolean bOk = (URLDownloadToFile(0, sURL, sLocalFilename, 0, 0) = 0) If bOk Then If fso.FileExists(sLocalFilename) Then '* Tools->Refernces Microsoft HTML Object Library Dim oHtml4 As MSHTML.IHTMLDocument4 Set oHtml4 = New MSHTML.HTMLDocument Dim oHtml As MSHTML.HTMLDocument Set oHtml = Nothing '* IHTMLDocument4.createDocumentFromUrl '* MSDN - IHTMLDocument4 createDocumentFromUrl method - https://msdn.microsoft.com/en-us/library/aa752523(v=vs.85).aspx Set oHtml = oHtml4.createDocumentFromUrl(sLocalFilename, "") '* need to wait a little whilst the document parses '* because it is multithreaded While oHtml.readyState <> "complete" DoEvents '* do not comment this out it is required to break into the code if in infinite loop Wend Debug.Assert oHtml.readyState = "complete" Dim sTest As String sTest = Left$(oHtml.body.outerHTML, 100) Debug.Assert Len(Trim(sTest)) > 50 '* just testing we got a substantial block of text, feel free to delete '* page specific logic goes here Dim htmlAnswers As Object 'MSHTML.DispHTMLElementCollection Set htmlAnswers = oHtml.getElementsByClassName("answer-hyperlink") Dim lAnswerLoop As Long For lAnswerLoop = 0 To htmlAnswers.Length - 1 Dim vAnswerLoop Set vAnswerLoop = htmlAnswers.Item(lAnswerLoop) Debug.Print vAnswerLoop.outerText Next End If End If End Sub

感谢您提出这个问题。

PS我已经使用TaskList来validation这个代码运行时不会在引擎盖下创buildIExplore.exe。

PPS如果你喜欢这个，那么在我的Excel开发平台博客上看到更多

如何parsingHTML，而无需在vba中创buildInternet Explorer的对象？

用法示例：

使用Excel VBA提交Javascript中的button

与IE标签交互一旦打开

Excel VBA从下载对话框打开文件

Excel VBA强制closuresIE

types不匹配错误13将值从Javascript复制到Excel工作表时

closures选项，从网站在Internet Explorer中打开xls文件？

Excel 15 VBA无法从Windows 8.1上的IE浏览器接收事件

使用VBA在IE中检查单选button，奇怪的行为

导航到VBA中的页面后无法重置HTML文档

vba即导航多个会话