使用VBA HTML从网页下载文件

我一直拼命几个月来自动化一个过程,一个csv文件被下载,configuration和保存在一个给定的位置。 到目前为止,我只用excel vbapipe理打开网页,点击底部下载csv文件,但代码停止,需要手动干预才能完成,我希望它是完全自动化的,如果可能的话。 看到使用的代码(我不是作者):

Sub WebDataExtraction() Dim URL As String Dim IeApp As Object Dim IeDoc As Object Dim ieForm As Object Dim ieObj As Object Dim objColl As Collection URL = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT" Set IeApp = CreateObject("InternetExplorer.Application") IeApp.Visible = True IeApp.Navigate URL Do Until IeApp.ReadyState = READYSTATE_COMPLETE Loop Set IeDoc = IeApp.Document For Each ele In IeApp.Document.getElementsByTagName("span") If ele.innerHTML = "CSV" Then Application.Wait (Now + TimeValue("0:00:15")) DoEvents ele.Click 'At this point you need to Save the document manually ' or figure out for yourself how to automate this interaction. Test_Save_As_Set_Filename File_Download_Click_Save End If Next IeApp.Quit End Sub" 

提前致谢

南兹奥

我正在发表第二个答案,因为我相信我的第一个答案对于许多类似的应用程序是足够的,所以在这个例子中不起作用。

为什么其他方法失败:

  • .Click方法:这提出了一个新的窗口,期望用户在运行时input,似乎不可能使用WinAPI来控制这个窗口。 或者,至less不是我能确定的任何方式。 代码执行停止在.Click行,直到用户手动介入,没有办法使用GoToWait或任何其他方法来规避此行为。
  • 使用WinAPI函数直接下载源文件不起作用,因为button的URL不包含文件,而是一个dynamic提供文件的js函数。

这是我build议的解决方法:

您可以阅读网页的.body.InnerText ,使用FileSystemObject将其写入纯文本/ csv文件,然后结合Regular Expressions和string操作,将数据parsing为正确定界的CSV文件。

 Sub WebDataExtraction() Dim url As String Dim fName As String Dim lnText As String Dim varLine() As Variant Dim vLn As Variant Dim newText As String Dim leftText As String Dim breakTime As Date '## Requires reference to Microsoft VBScript Regular Expressions 5.5 Dim REMatches As MatchCollection Dim m As Match '## Requires reference to Microsoft Internet Controls Dim IeApp As InternetExplorer '## Requires reference to Microsoft HTML object library Dim IeDoc As HTMLDocument Dim ele As HTMLFormElement '## Requires reference to Microsoft Scripting Runtime Dim fso As FileSystemObject Dim f As TextStream Dim ln As Long: ln = 1 breakTime = DateAdd("s", 60, Now) url = "http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT" Set IeApp = CreateObject("InternetExplorer.Application") With IeApp .Visible = True .Navigate url Do Until .ReadyState = 4 Loop Set IeDoc = .Document End With 'Wait for the data to display on the page Do If Now >= breakTime Then If MsgBox("The website is taking longer than usual, would you like to continue waiting?", vbYesNo) = vbNo Then GoTo EarlyExit Else: breakTime = DateAdd("s", 60, Now) End If End If Loop While Trim(IeDoc.body.innerText) = "XML CSV Please Wait Data Loading Sorting" '## Create the text file fName = ActiveWorkbook.Path & "\exported-csv.csv" Set fso = CreateObject("Scripting.FileSystemObject") Set f = fso.OpenTextFile(fName, 2, True, -1) f.Write IeDoc.body.innerText f.Close Set f = Nothing '## Read the text file Set f = fso.OpenTextFile(fName, 1, False, -1) Do lnText = f.ReadLine '## The data starts on the 4th line in the InnerText. If ln >= 4 Then '## Return a collection of matching date/timestamps to which we can parse Set REMatches = SplitLine(lnText) newText = lnText For Each m In REMatches newText = Replace(newText, m.Value, ("," & m.Value & ","), , -1, vbTextCompare) Next '## Get rid of consecutive delimiters: Do newText = Replace(newText, ",,", ",") Loop While InStr(1, newText, ",,", vbBinaryCompare) <> 0 '## Then use some string manipulation to parse out the first 2 columns which are ' not a match to the RegExp we used above. leftText = Left(newText, InStr(1, newText, ",", vbTextCompare) - 1) leftText = Left(leftText, 10) & "," & Right(leftText, Len(leftText) - 10) newText = Right(newText, Len(newText) - InStr(1, newText, ",", vbTextCompare)) newText = leftText & "," & newText '## Store these lines in an array ReDim Preserve varLine(ln - 4) varLine(ln - 4) = newText End If ln = ln + 1 Loop While Not f.AtEndOfStream f.Close '## Re-open the file for writing the delimited lines: Set f = fso.OpenTextFile(fName, 2, True, -1) '## Iterate over the array and write the data in CSV: For Each vLn In varLine 'Omit blank lines, if any. If Len(vLn) <> 0 Then f.WriteLine vLn Next f.Close EarlyExit: Set fso = Nothing Set f = Nothing IeApp.Quit Set IeApp = Nothing End Sub Function SplitLine(strLine As String) As MatchCollection 'returns a RegExp MatchCollection of Date/Timestamps found in each line '## Requires reference to Microsoft VBScript Regular Expressions 5.5 Dim RE As RegExp Dim matches As MatchCollection Set RE = CreateObject("vbscript.regexp") With RE .MultiLine = False .Global = True .IgnoreCase = True '## Use this RegEx pattern to parse the date & timestamps: .Pattern = "(19|20)\d\d[-](0[1-9]|1[012])[-](0[1-9]|[12][0-9]|3[01])[ ]\d\d?:\d\d:\d\d" End With Set matches = RE.Execute(strLine) Set SplitLine = matches End Function 

编辑

我使用URLtesting了我的原始答案代码:

http://www.bmreports.com/bsp/BMRSSystemData.php?pT=DDAD&zT=N&dT=NRT#saveasCSV

但是这个方法似乎并不适用于这个网站。 ele.Click似乎并没有启动下载,只是在网页上打开数据表格。 要下载,您需要执行右键单击/另存为。 如果你已经得到了那么多(正如我怀疑的那样,根据你调用的子程序,但是你没有提供代码),那么你可以使用Win API获得保存对话框的HWND,并可能自动执行事件。 Santosh提供了一些信息:

VBA – 进入网站并从保存提示下载文件

这也是一个很好的资源,可以帮助你解决问题:

http://social.msdn.microsoft.com/Forums/en-US/beb6fa0e-fbc8-49df-9f2e-30f85d941fad/download-file-from-ie-with-vba

原始答复

如果您能够确定CSV的URL,则可以使用此子例程打开到CSV数据的连接并将其直接导入到工作簿。 您可能需要在导入的数据上自动执行文本到列的操作,但可以轻松地使用macroslogging器进行复制。 我在下面的Test()子例程中join了一个例子。

您可以轻松地修改此操作以在新的工作簿中添加QueryTables ,然后在该工作簿上自动执行SaveAs方法以将文件另存为CSV。

本示例使用Yahoo Finance的已知URL(福特汽车公司),并将活动工作表的A1单元QueryTables中的CSV数据添加到QueryTables 。 这可以很容易地修改,把它放在另一个工作表,另一个工作簿等

 Sub Test() Dim MyURL as String MyURL = "http://ichart.finance.yahoo.com/table.csv?s=GM&a0&b=1&c2010&d=05&e=20&f=2013&g=d&ignore=.csv" OpenURL MyURL 'Explode the CSV data: Range("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _ Semicolon:=False, Comma:=True, Space:=False, Other:=False, FieldInfo _ :=Array(Array(1, 3), Array(2, 1), Array(3, 1), Array(4, 1), Array(5, 1), Array(6, 1), _ Array(7, 1)), TrailingMinusNumbers:=True End Sub Private Sub OpenURL(fullURL As String) 'This opens the CSV in querytables connection. On Error GoTo ErrOpenURL With ActiveSheet.QueryTables.Add(Connection:= _ "URL;" & fullURL, Destination:=Range("A1")) .Name = fullURL .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = True .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlEntirePage .WebFormatting = xlWebFormattingAll .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With ExitOpenURL: Exit Sub 'if all goes well, you can exit 'Error handling... ErrOpenURL: Err.Clear bCancel = True Resume ExitOpenURL End Sub