在Excel vba中操作.htm文件

我试图在Excel VBA中创build应用程序,但是我遇到了一些问题。 我需要我的Excel应用程序来下载一些zip格式的文件。 我已经完成了这部分的问题,我的应用程序可以下载和解压缩文件。 接下来,我将以.htm扩展名读取整个文件,并从中获取一些信息。 需要这样工作,当应用程序打开时,程序应该在Basil中查找“concurso”的最后一个比赛号码,然后在.htm文件中查找相同的号码并开始复制下一个数据。

我已经发现了一个模式来读取文件,并获得我想要的数据,但我不知道如何编码。 要提取的.htm文件中的模式是在标签td内,这是一个有2个斜杠的文本,因此,我是一个date,在这个时候,我要做3件事情,获取date,date我的数目是concurso,所以我也需要得到它,而且在15个数字之下,我还需要15个数字。 此模式不会更改,必须处理直到.htm文件结束。 并将这些数据传输到我的工作表中以供稍后处理。

如有疑问,我会进一步澄清。
这是我用来下载和解压缩文件的代码

Sub DownloadEUnzip() Dim FSO, oApp As Object Dim objHttp, DefPath, Arquivo As String Dim Dados() As Byte Dim Fname As Variant Dim FileNameFolder As Variant Dim iFileNumber As Long Dim diretorio As String diretorio = Dir("c:\lotofacil\D_LOTFAC.HTM") If diretorio = "D_LOTFAC.HTM" Then Kill "C:\lotofacil\*" End If Set objHttp = CreateObject("MSXML2.ServerXMLHTTP") objHttp.Open "GET", "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_lotfac.zip", False objHttp.Send DefPath = "C:\lotofacil\" '<<< Altere aqui Arquivo = DefPath & "D_lotfac.zip" If objHttp.Status = "200" Then Dados = objHttp.ResponseBody iFileNumber = FreeFile Open Arquivo For Binary Access Write As #iFileNumber Put #iFileNumber, 1, Dados Close #iFileNumber End If If Right(DefPath, 1) <> "\" Then DefPath = DefPath & "\" End If FileNameFolder = DefPath Set oApp = CreateObject("Shell.Application") oApp.Namespace(FileNameFolder).CopyHere oApp.Namespace("C:\lotofacil\D_lotfac.zip").items MsgBox "Arquivos baixados e descompactados com sucesso!" End Sub 

在这里你可以下载文件来显示问题。

ps在C:驱动器上创build一个名为lotofacil的文件夹,以便电子表格正常工作。

更新1

代码find一个date

 If Mid(dataline, 19, 1) = "/" And Mid(dataline, 22, 1) = "/" Then Debug.Print dataline End If 

更新2

所以caio,它现在真正的快,但是当我使用我注意到,该程序是一个列需要less于一个,我改变了代码,它的工作显然..如果你看看我是否没有任何混乱…我改变arrays的大小,看起来像工作:)看看。

 Sub ReadLines() Dim dataArray() As String Dim strText Dim result As String Dim regExDate As New RegExp, regExAnyContent As New RegExp Dim matches As MatchCollection Dim match As match Dim previous As String, current As String Dim currentLine As Integer ReDim dataArray(17, 1000) regExDate.Pattern = "(\d{2}/\d{2}/\d{4})" regExAnyContent.Pattern = "<td[^>]*>([^<]*)" dirPath = "c:\lotofacil\" filePath = dirPath & "D_LOTFAC.HTM" result = "" currentLine = 0 If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub FileNum = FreeFile() Open filePath For Input As #FileNum previous = "" While Not EOF(FileNum) Line Input #FileNum, current ' read in data 1 line at a time If Len(current) > 0 Then Set matches = regExDate.Execute(current) If matches.Count > 0 Then dataArray(1, currentLine) = matches.Item(0) dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0) For i = 1 To 16 Line Input #FileNum, current While current = "" Line Input #FileNum, current Wend dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0) Next currentLine = currentLine + 1 If currentLine Mod 1000 = 0 Then ReDim Preserve dataArray(17, currentLine + 1000) End If End If previous = current End If ' decide what to do with dataline, ' depending on what processing you need to do for each case Wend Range(Cells(1, 1), Cells(currentLine, 17)) = Application.Transpose(dataArray) End Sub 

但仍然发生在一个非常奇怪的事情,在数据表被推出,date是错误的,我需要他们的格式dd / mm / yyyy,我知道我已经窃听你,但如果太难做这个改变你可以脱掉这个date的列? 请…

首先非常感谢你,我非常擅长excel;)

试试这个将文件读到剪贴板并将其内容粘贴到工作表中,这将创build一个正常的Excel表格,您将可以使用。

这将使用excel的自然能力来parsinghtml表格到常规excel表格。

 Sub ReadFilePasteAsTable()
 Dim objData As New MSForms.DataObject
昏暗的strText
昏暗的结果为string
 Dim numberOfLines Integer
昏暗的wsh作为对象
设置wsh = VBA.CreateObject(“WScript.Shell”)


 numberOfLines = 126
 dirPath =“c:\ lotofacil \”
 diretorio = Dir(dirPath&“D_LOTFAC.HTM”)
结果=“”

如果不是diretorio =“D_LOTFAC.HTM”,则退出子
 FileNum = FreeFile()

 filePath = dirPath&“D_LOTFAC.HTM”
 outPath = dirPath&“out.txt”
 pscommand =“Powershell -Command”“''+ $(cat”&filePath&“-Tail 126)>”&outPath&“”“”
 wsh.Run pscommand,0,True

打开outPathinput为#FileNum

虽然不是EOF(FileNum)
    行input#FileNum,DataLine'一次读入1行数据
    结果=结果&DataLine
     “决定用dataline做什么,
    取决于你需要为每个案件做什么处理
蜿蜒

     objData.SetText结果
     objData.PutInClipboard

 ActiveSheet.Paste目标:= [A1]
结束小组

不要忘记添加对Microsoft Forms 2.0的引用。 要添加参考打开VBA窗口,打开菜单Tools-> References …

如果您找不到Microsoft Forms 2.0对象库打开浏览…并且它将位于64位操作系统的C:\ Windows \ SysWOW64 \ FM20.dll或32位的C:\ Windows \ System32 \ FM20.dll中。

UPDATE

现在您需要添加对Microsoft VBScript Regular Expressions 5.5引用

 Sub ReadLines() Dim dataArray() As String Dim strText Dim result As String Dim regExDate As New RegExp, regExAnyContent As New RegExp Dim matches As MatchCollection Dim match As match Dim previous As String, current As String Dim currentLine As Integer ReDim dataArray(16, 1000) regExDate.Pattern = "(\d{2}/\d{2}/\d{4})" regExAnyContent.Pattern = "<td[^>]*>([^<]*)" dirPath = "c:\lotofacil\" filePath = dirPath & "D_LOTFAC.HTM" result = "" currentLine = 0 If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub FileNum = FreeFile() Open filePath For Input As #FileNum previous = "" While Not EOF(FileNum) Line Input #FileNum, current ' read in data 1 line at a time If Len(current) > 0 Then Set matches = regExDate.Execute(current) If matches.Count > 0 Then dataArray(1, currentLine) = matches.Item(0) dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0) For i = 1 To 15 Line Input #FileNum, current While current = "" Line Input #FileNum, current Wend dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0) Next currentLine = currentLine + 1 If currentLine Mod 1000 = 0 Then ReDim Preserve dataArray(16, currentLine + 1000) End If End If previous = current End If ' decide what to do with dataline, ' depending on what processing you need to do for each case Wend Range(Cells(1, 1), Cells(currentLine, 16)) = Application.Transpose(dataArray) End Sub