网站数据表刮刀

在问我的问题之前,我是一个业余编码器,基本上没有任何有意义的经验超越VBA在MS Office应用程序(我知道 – noob!)

我试图创build一个使用VBA将数据导入到Excel中的网页抓取工具,并根据我在下面的代码摘录中的评论,我已经能够find的最好的是在这个问题的胜利答案。

下面,我以investing.com为例,但实际上我的项目将跨越多个站点,并将投入到每天都会更新的matrix中,并且在事件到期时自我纵容 – 为此,我宁愿在前面代码方面的工作量尽可能小(对我来说)。

考虑到这一点,我可以问一下,是否有办法做到以下任何一种情况(支撑一下自己,这对于一些人来说是一种值得畏缩的基本知识):

  1. 有没有一种方法,我可以导航到一个url,并在该网页上的每个表上运行一个循环(没有任何已知的ID)? 这是为了加快我的代码,尽量减less我的input,因为会有相当多的数据被更新,我打算在刷新上放置一个2分钟的循环触发器。

  2. 而不是做我在下面做的事情,是可以引用一个表而不是一行,并沿着单元格行(2,5).value行来返回第1行第4列的值? (假设在两个维度上,数组索引都是从0开始的)?除此之外,我的第一列(在某些方面,我的主键)在所有的数据源上可能不是相同的顺序,所以有办法我可以做Columns("A:A").Find(What:=[Primary key], After:=Cells(1, 1), LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=True, SearchFormat:=False).Rowfind表内的哪一行与我要找的内容有关?

代码:

 Sub Scraper() Dim appIE, allRowOfData As Object ' As per https://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba Set appIE = CreateObject("internetexplorer.application") With appIE .Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page .Visible = False End With Do While appIE.Busy Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again Loop Set allRowOfData = appIE.document.getElementById("pair_8907") 'tr id="[ID of row within table]" Dim myValue As String: myValue = allRowOfData.Cells(8).innerHTML 'The 8 is the column number of the table '(note: column numbers start at 0 so the 9th column should have "8" entered here Set appIE = Nothing Range("A1").Value = myValue End Sub 

如果你想使用Excel函数浏览表格,为什么不把表格首先转储到工作表上,这段代码就适用于我

 Option Explicit Sub Scraper() Dim appIE As Object ' As per http://stackoverflow.com/questions/27066963/scraping-data-from-website-using-vba Set appIE = CreateObject("internetexplorer.application") With appIE .Navigate "http://uk.investing.com/rates-bonds/financial-futures" 'Sample page .Visible = True End With Do While appIE.Busy DoEvents Application.Wait (Now + TimeValue("0:00:01")) 'If page not open, wait a second befor trying again Loop 'Debug.Print TypeName(appIE.document) Dim doc As Object 'MSHTML.HTMLDocument Set doc = appIE.document '* appIE busy is good but you need to wait for the whole document to completely load and initialise so use this While doc.readyState <> "complete" DoEvents Wend '* we can select all the tables because they share the same CSS class name Dim tablesSelectedByClass As Object 'MSHTML.HTMLElementCollection Set tablesSelectedByClass = doc.getElementsByClassName("genTbl") '* you can change this, it was just convenient for me to add sheets to my workbook Dim shNewResults As Excel.Worksheet Set shNewResults = ThisWorkbook.Worksheets.Add Dim lRowCursor As Long '* this controls pasting down the sheet lRowCursor = 1 Dim lTableIndexLoop As Long For lTableIndexLoop = 0 To tablesSelectedByClass.Length - 1 Dim tableLoop As Object 'MSHTML.HTMLTable Set tableLoop = tablesSelectedByClass.Item(lTableIndexLoop) If LenB(tableLoop.ID) > 0 Then '* there are some extra nonsense tables, this subselects Dim sParentColumn As String, objParentColumn As Object ' MSHTML.HTMLSemanticElement Set objParentColumn = FindMyColumn(tableLoop, sParentColumn) '* need to understand is table on left hand or right hand side Dim vHeader As Variant: vHeader = Empty If sParentColumn = "leftColumn" Then '* tables on the left have a preceding H3 element with the table's description Dim objH3Headers As Object Set objH3Headers = objParentColumn.getElementsByTagName("H3") vHeader = objH3Headers.Item(lTableIndexLoop).innerText Else '* tables on the right have a hidden attribute we can use vHeader = tableLoop.Attributes.Item("data-gae").Value If Len(vHeader) > 3 Then vHeader = Mid$(vHeader, 4) Mid$(vHeader, 1, 1) = Chr(Asc(Mid$(vHeader, 1, 1)) - 32) End If End If '* tables on the right do not have column headers Dim bHasColumnHeaders As Boolean bHasColumnHeaders = (tableLoop.ChildNodes.Length = 2) Dim vTableCells() As Variant '* this will be our table data container which we will paste in one go Dim lRowCount As Long: lRowCount = 0 Dim lColumnCount As Long: lColumnCount = 0 Dim lDataHeadersSectionIdx As Long: lDataHeadersSectionIdx = 0 Dim objColumnHeaders As Object: Set objColumnHeaders = Nothing If bHasColumnHeaders Then Set objColumnHeaders = tableLoop.ChildNodes.Item(0).ChildNodes.Item(0) lRowCount = lRowCount + 1 lDataHeadersSectionIdx = 1 Else lDataHeadersSectionIdx = 0 End If Dim objDataRows As Object 'MSHTML.HTMLElementCollection Set objDataRows = tableLoop.ChildNodes.Item(lDataHeadersSectionIdx).ChildNodes lColumnCount = objDataRows.Item(0).ChildNodes.Length lRowCount = lRowCount + objDataRows.Length ReDim vTableCells(1 To lRowCount, 1 To lColumnCount) As Variant '* we have them get the column headers Dim lColLoop As Long If bHasColumnHeaders Then For lColLoop = 1 To lColumnCount vTableCells(1, lColLoop) = objColumnHeaders.ChildNodes.Item(lColLoop - 1).innerText Next End If '* get the data cells Dim lRowLoop As Long For lRowLoop = 1 To lRowCount - VBA.IIf(bHasColumnHeaders, 1, 0) For lColLoop = 1 To lColumnCount vTableCells(lRowLoop + VBA.IIf(bHasColumnHeaders, 1, 0), lColLoop) = objDataRows.Item(lRowLoop - 1).ChildNodes.Item(lColLoop - 1).innerText Next Next '* paste our table description shNewResults.Cells(lRowCursor, 1).Value2 = vHeader lRowCursor = lRowCursor + 1 '* paste our table data shNewResults.Cells(lRowCursor, 1).Resize(lRowCount, lColumnCount).Value2 = vTableCells lRowCursor = lRowCursor + lRowCount + 1 End If Next End Sub Function FindMyColumn(ByVal node As Object, ByRef psColumn As String) As Object '* this code ascends the DOM looking for "column" in the id of each node While InStr(1, node.ID, "column", vbTextCompare) = 0 And Not node.ParentNode Is Nothing DoEvents Set node = node.ParentNode Wend If InStr(1, node.ID, "column", vbTextCompare) > 0 Then Set FindMyColumn = node psColumn = CStr(node.ID) End If End Function 

顺便说一句,如果你经常交易经纪人致富,你会变穷,经纪费的确会影响长远。