XML网站用dynamic密钥来抓取网站

我一直在使用IE浏览器从Excel中抓取这个网站,但最近使用IE浏览器已经不一致和缓慢。 我的列表通常在500到1000左右,所以我必须在一夜之间运行macros。 最近macros开始挂断。 这就是为什么我决定第一次使用MSXML2的资源pipe理器。

该网站不需要身份validation,但它具有dynamic更改的隐藏input。

我做了什么..我用GET来拉动网站,并提取dynamic密钥,然后尝试使用POST发送input数据到网站。 我一直在获取服务器错误/运行时错误。 我曾尝试使用不同的标题请求选项,但我仍然没有得到结果page.I也试图使用MSXML2.ServerXMLHTTP。 我在正确的轨道?

谢谢

Sub test_66() Dim oXML_get 'Dim oXML_post Dim sendText As String, s2 As String, n1 As Integer, postUrl As String, sHTML As String, s1 As String ' Instantiate MSXML2 Set oXML_get = New MSXML2.XMLHTTP oXML_get.Open "GET", "http://www.phila.gov/revenue/realestatetax/default.aspx", False oXML_get.setRequestHeader "Accept", "text/html;charset=UTF-8" oXML_get.setRequestHeader "Accept-Encoding", "identity" oXML_get.setRequestHeader "Accept-Charset", "UTF-8" 'Connection keep -alive oXML_get.setRequestHeader "Connection", "keep -alive" oXML_get.send sHTML = oXML_get.responseText 'Debug.Print sHTML Dim hDOC As MSHTML.HTMLDocument Set hDOC = New MSHTML.HTMLDocument hDOC.body.innerHTML = sHTML s1 = Replace(hDOC.getElementsByTagName("input").Item(2).Value, "/", "%2F") s2 = Replace(hDOC.getElementsByTagName("input").Item(3).Value, "/", "%2F") sendText = "__VIEWSTATE=" & s1 & "&__EVENTVALIDATION=" & s2 & "&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo=043185500&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT=%20>>" Debug.Print sendText '"__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=" & s1 & "__EVENTVALIDATION=" & s2 & oXML_get.Open "POST", "http://www.phila.gov/revenue/realestatetax/default.aspx", False oXML_get.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" oXML_get.setRequestHeader "Accept", "text/html;charset=UTF-8" oXML_get.setRequestHeader "Accept-Encoding", "identity" oXML_get.setRequestHeader "Accept-Charset", "UTF-8" 'Connection keep -alive 'oXML_get.setRequestHeader "Connection", "keep -alive" oXML_get.send (sendText) Dim objIE As Object: Set objIE = CreateObject("InternetExplorer.Application") objIE.navigate "about:blank" objIE.Visible = True objIE.document.Write oXML_get.responseText End Sub 

这是我得到的运行时错误信息….

 Server Error in '/revenue/RealEstateTax' Application. <!-- Web.Config Configuration File --> <configuration> <system.web> <customErrors mode="Off"/> </system.web> </configuration> 

我在Firefox的网页上提交了相同的search请求。 之后,我打开开发工具F12 ,networking选项卡,点击最后的POST请求,打开参数部分,这里是已经提交的参数的屏幕截图:

表格数据

原始表单数据:

__EVENTTARGET =&__ EVENTARGUMENT =&__ VIEWSTATE = %% 2FK3EZTRu3h3w&__ EVENTVALIDATION =%2FwEWBQKkrNCPCgLRzsWTBwLlpIbACAKV6q2KDQKIvdHyCawQaHbBYSHV%2B%2FVvyLUTUY%2BhSsmbpTvj0W4ycfOa1RCO&ctl00%24BodyContentPlaceHolder%24SearchByAddressControl%24txtLookup =由+物业+地址&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo = 043185500&ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT = +%3E%3E

请注意,有7个参数。 所有这些应该是URL编码的。 我稍微修改了一下代码,还添加了一些请求头文件。 以下代码适用于我:

 Option Explicit Sub test_66() Dim s1 As String Dim s2 As String Dim sResp As String Dim aTmp As Variant Dim sBRTNumber As String Dim sFormData As String With CreateObject("MSXML2.XMLHTTP") .Open "GET", "http://www.phila.gov/revenue/realestatetax/default.aspx", False .setRequestHeader "Accept", "text/html;charset=UTF-8" .setRequestHeader "Accept-Encoding", "identity" .setRequestHeader "Accept-Charset", "UTF-8" .setRequestHeader "Connection", "keep-alive" .send sResp = .responseText End With aTmp = Split(sResp, "id=""__VIEWSTATE"" value=""", 2) s1 = aTmp(1) aTmp = Split(s1, """", 2) s1 = aTmp(0) aTmp = Split(sResp, "id=""__EVENTVALIDATION"" value=""", 2) s2 = aTmp(1) aTmp = Split(s2, """", 2) s2 = aTmp(0) s1 = EncodeUriComponent(s1) s2 = EncodeUriComponent(s2) sBRTNumber = "043185500" sFormData = Join(Array( _ "__EVENTTARGET=", _ "__EVENTARGUMENT=", _ "__VIEWSTATE=" & s1, _ "__EVENTVALIDATION=" & s2, _ "ctl00%24BodyContentPlaceHolder%24SearchByAddressControl%24txtLookup=by+Property+Address", _ "ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24txtTaxInfo=" & sBRTNumber, _ "ctl00%24BodyContentPlaceHolder%24SearchByBRTControl%24btnTaxByBRT=+%3E%3E" _ ), "&") With CreateObject("MSXML2.XMLHTTP") .Open "POST", "http://www.phila.gov/revenue/realestatetax/default.aspx", False .setRequestHeader "Content-Type", "application/x-www-form-urlencoded" .setRequestHeader "Accept", "text/html;charset=UTF-8" .setRequestHeader "Accept-Encoding", "identity" .setRequestHeader "Accept-Charset", "UTF-8" .setRequestHeader "Connection", "keep-alive" .setRequestHeader "Host", "www.phila.gov" .setRequestHeader "Origin", "http://www.phila.gov" .setRequestHeader "Referer", "http://www.phila.gov/revenue/realestatetax/default.aspx" .send (sFormData) sResp = .responseText End With With CreateObject("InternetExplorer.Application") .navigate "about:blank" .Visible = True .document.write sResp End With End Sub Function EncodeUriComponent(strText As String) As String Static objHtmlfile As Object If objHtmlfile Is Nothing Then Set objHtmlfile = CreateObject("htmlfile") objHtmlfile.parentWindow.execScript "function encode(s) {return encodeURIComponent(s)}", "jscript" End If EncodeUriComponent = objHtmlfile.parentWindow.encode(strText) End Function 

这里是IE窗口输出:

产量