使用VBA将网页数据传输到Excel工作表

这是我的第一篇文章。 我是VBA新手,但对VB6非常熟悉。 我编写了一些代码,将来自nasdaq的文本粘贴到工作表中。 它终于起作用了。 “年度利润表”上下分散着大量的无关数据。 我想parsing出来,并把重要的数据放在一个可以自动分析的地方。 我想我可以search单元格,直到find:Annual Income Statement(年度损益表)并将其提取到其他表单中。 任何build议将非常感激。 这是我得到的:

Sub TransferWebData() Dim IE As Object Set IE = CreateObject("InternetExplorer.Application") With IE .Visible = True .Navigate "http://www.nasdaq.com/symbol/gd/financials" Do Until .ReadyState = 4: DoEvents: Loop IE.ExecWB 17, 0 'SelectAll IE.ExecWB 12, 2 'Copy selection Sheets("GD").Range("A1").Select Sheets("GD").PasteSpecial Format:="HTML", Link:=False, DisplayAsIcon:=False, NoHTMLFormatting:=True IE.Quit End With End Sub 

这应该让你忙。

设置对Microsoft HTML对象库Microsoft Internet控件的引用。

在这里输入图像说明

在谷歌浏览器中,我导航到网页并使用inspect元素打开WebKit并将xpath复制到元素。 这给了我和大纲来撰写我的function的草稿。 经过一个半小时的繁琐的debugging,我能够将数据提取到一个数组中。

// * [@ id =“financials-iframe-wrap”] / div 1 / table / tbody / tr 1 / td 2

在这里输入图像说明

 Sub TransferWebData() Dim Data Dim IE As Object Set IE = CreateObject("InternetExplorer.Application") With IE .Visible = True .Navigate "http://www.nasdaq.com/symbol/gd/financials" Do Until .ReadyState = 4: DoEvents: Loop Data = getFinancialsArray(IE.document) With Worksheets("GD") .Cells.ClearContents .Range("A1").Resize(UBound(Data, 1) + 1, UBound(Data, 2)).Value = Data .Columns.AutoFit End With IE.Quit End With End Sub ' //*[@id="financials-iframe-wrap"]/div[1]/table/tbody/tr[1]/td[2] Function getFinancialsArray(doc As HTMLDocument) Dim Data Dim x As Long, y As Long, y1 As Long Dim divfinancials As HTMLDivElement, div1 As HTMLDivElement Dim tbl As HTMLTable, allRows Set divfinancials = doc.getElementById("financials-iframe-wrap") Set div1 = divfinancials.getElementsByTagName("div").Item(0) Set tbl = div1.getElementsByTagName("table").Item(0) Set allRows = tbl.getElementsByTagName("tr") Dim s As String ReDim Data(allRows.Length, 10) For y = 0 To allRows.Length - 1 If Len(Trim(allRows.Item(y).innerText)) Then 'If the row has data For x = 0 To allRows.Item(y).Cells.Length - 1 Data(y1, x) = allRows.Item(y).Cells(x).innerText Next y1 = y1 + 1 End If Next getFinancialsArray = Data End Function 

产量

在这里输入图像说明

看看下面的例子,使用XHR和RegEx检索数据,没有IE自动化:

 Option Explicit Sub GetDataFromNasdaq() Dim sContent As String Dim l As Long Dim i As Long Dim j As Long Dim cMatches As Object Dim r() As String ' retrieve html content With CreateObject("MSXML2.XMLHTTP") .Open "GET", "http://www.nasdaq.com/symbol/gd/financials", False .Send sContent = .ResponseText End With ' parse with regex With CreateObject("VBScript.RegExp") .MultiLine = True .IgnoreCase = True ' simplification .Global = True .Pattern = "<(\w*) .*?>" sContent = .Replace(sContent, "<$1>") .Pattern = ">\s*<" sContent = .Replace(sContent, "><") .Pattern = "<thead>|<tbody>|</thead>|</tbody>" sContent = .Replace(sContent, "") .Pattern = "<(/?)th>" sContent = .Replace(sContent, "<$1td>") ' remove nested tables from target table .Global = False .Pattern = "(Annual Income Statement[\s\S]*?<table.*?>(?:(?!</table)[\s\S])*)<table.*?>(?:(?!<table|</table)[\s\S])*</table>" Do l = Len(sContent) sContent = .Replace(sContent, "$1") Loop Until l = Len(sContent) ' trim target table .Pattern = "Annual Income Statement[\s\S]*?(<table.*?>(?:(?!</table)[\s\S])*</table>)" sContent = .Execute(sContent).Item(0).SubMatches(0) ' match rows .Global = True .Pattern = "<tr><td>(.*?)</td>(?:<td>.*?</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td>)?</tr>" Set cMatches = .Execute(sContent) ' populate resulting array ReDim r(1 To cMatches.Count, 1 To 5) For i = 1 To cMatches.Count For j = 1 To 5 r(i, j) = cMatches(i - 1).SubMatches(j - 1) Next Next End With ' ouput resulting array With ThisWorkbook.Sheets(1) Cells.Delete Output .Cells(1, 1), r End With End Sub Sub Output(oDstRng As Range, aCells As Variant) With oDstRng .Parent.Select With .Resize( _ UBound(aCells, 1) - LBound(aCells, 1) + 1, _ UBound(aCells, 2) - LBound(aCells, 2) + 1 _ ) '.NumberFormat = "@" .Value = aCells .Columns.AutoFit End With End With End Sub 

完成处理大约需要2秒钟,输出如下:

产量

您也可以为多个股票代码导入损益表项目。

 Sub ImportYrlyFS() ThisSheet = ActiveSheet.Name Range("A2").Select Do Until ActiveCell.Value = "" Symbol = ActiveCell.Value Sheets(ThisSheet).Select Sheets.Add Dim QT As QueryTable Symbol = UCase(Symbol) myurl = "http://finance.yahoo.com/q/is?s=" & Symbol & "+Income+Statement&annual" Set QT = ActiveSheet.QueryTables.Add( _ Connection:="URL;" & myurl, _ Destination:=Range("A1")) With QT .WebSelectionType = xlSpecifiedTables .WebTables = "9" .Refresh BackgroundQuery:=False End With QT.Delete Sheets(ActiveSheet.Name).Name = Symbol Sheets(ThisSheet).Select ActiveCell.Offset(1, 0).Select Loop End Sub 

你的表应该看起来像这样。

在这里输入图像说明