从HTML标记中的文件中刮除文本

我有一个文件，我想从中提取date，这是一个HTML源文件，所以它充满了我不需要的代码和短语。我需要提取每个包裹在特定HTML标记中的date的实例：

abbr title =“（（这是我需要的文本））”data-utime =“

什么是最简单的方法来实现呢？

如果您使用的是Excel VBA，请将引用（工具 – 引用）设置为MSHTML库（在参考菜单中标题为Microsoft HTML Object Library ）

 Sub ScrapeDateAbbr() Dim hDoc As MSHTML.HTMLDocument Dim hElem As MSHTML.HTMLGenericElement Dim sFile As String, lFile As Long Dim sHtml As String 'read in the file lFile = FreeFile sFile = "C:/Users/dick/Documents/My Dropbox/Excel/Testabbr.html" Open sFile For Input As lFile sHtml = Input$(LOF(lFile), lFile) 'put into an htmldocument object Set hDoc = New MSHTML.HTMLDocument hDoc.body.innerHTML = sHtml 'loop through abbr tags For Each hElem In hDoc.getElementsByTagName("abbr") 'only those that have a data-utime attribute If Len(hElem.getAttribute("data-utime")) > 0 Then 'get the title attribute Debug.Print hElem.getAttribute("title") End If Next hElem End Sub

我以为你在源文件中调用的文件是本地的。如果您需要先下载它，则需要另外引用MSXML和此代码

 Sub ScrapeDateAbbrDownload() Dim xHttp As MSXML2.XMLHTTP Dim hDoc As MSHTML.HTMLDocument Dim hElem As MSHTML.HTMLGenericElement Set xHttp = New MSXML2.XMLHTTP xHttp.Open "GET", "file:///C:/Users/dick/Documents/My%20Dropbox/Excel/Testabbr.html" xHttp.send Do DoEvents Loop Until xHttp.readyState = 4 'put into an htmldocument object Set hDoc = New MSHTML.HTMLDocument hDoc.body.innerHTML = xHttp.responseText 'loop through abbr tags For Each hElem In hDoc.getElementsByTagName("abbr") 'only those that have a data-utime attribute If Len(hElem.getAttribute("data-utime")) > 0 Then 'get the title attribute Debug.Print hElem.getAttribute("title") End If Next hElem End Sub

如果你使用Java，你可以使用Jsoup 。这个问题还不清楚，请详细说明你到底在做什么

从HTML标记中的文件中刮除文本

Apple脚本错误609与电子邮件导出到Excel

检测数据集中的变化

如何只分析文本input在Excel中或有任何软件做到这一点？

什么程序用于大型数据库（数字）以后用于计算，分析和graphics？

Excel / Pentaho加倍计数

免费的VB6 / VBA分析器和最佳的Excel实践

删除其中有数千行的多个数据集中具有零的行

在excel中匹配两列，在拼写上略有差异

我们如何在MS Excel中执行常用集合操作（union，intersection，minus）？

如何使用Python创buildExcel文件的见解？我是两种语言的初学者

从HTML标记中的文件中刮除文本

Apple脚本错误609与电子邮件导出到Excel

检测数据集中的变化

如何只分析文本input在Excel中或有任何软件做到这一点？

什么程序用于大型数据库（数字）以后用于计算，分析和graphics？

Excel / Pentaho加倍计数

免费的VB6 / VBA分析器和最佳的Excel实践

删除其中有数千行的多个数据集中具有零的行

在excel中匹配两列，在拼写上略有差异

我们如何在MS Excel中执行常用集合操作（​​union，intersection，minus）？

如何使用Python创buildExcel文件的见解？ 我是两种语言的初学者

我们如何在MS Excel中执行常用集合操作（union，intersection，minus）？

如何使用Python创buildExcel文件的见解？我是两种语言的初学者