在html标签中提取文本,并将其作为表标题

现在表格标题被设置为

tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4") 

如何根据html标签中的文字更改四个表格标题的名称?

例如,我想把第一个表名改成“重要财务指标”,其名称是“a1”,id也是“a1”,将下表改为“资产负债表”,“现金stream量表”和“综合损益表“呢?

请参阅下面的html代码。

 Sub GetFinanceData() For x = 1 To 10 Dim URL As String, elemCollection As Object Dim t As Integer, r As Integer, c As Integer Worksheets("Stocks").Select Worksheets("Stocks").Activate 'Open IE and Go to the Website URL = "http://stock.finance.sina.com.cn/hkstock/finance/00001.html" URL = Cells(x, 1) Set IE = CreateObject("InternetExplorer.Application") With IE .navigate URL .Visible = True Do While .Busy = True Or .readyState <> 4 Loop DoEvents Worksheets.Add(After:=Worksheets(Worksheets.Count)).Name = _ ThisWorkbook.Worksheets("Stocks").Range("B" & x).Value 'You could even simplify it and just state the name as Cells(x,2) 'Select the Report Type Set selectItems = IE.Document.getElementsByTagName("select") For Each i In selectItems i.Value = "zero" i.FireEvent ("onchange") Application.Wait (Now + TimeValue("0:00:05")) Next i Do While .Busy: DoEvents: Loop ActiveSheet.Range("A1:K500").ClearContents ActiveSheet.Range("A1").Value = .Document.getElementsByTagName("h1")(0).innerText ActiveSheet.Range("B1").Value = .Document.getElementsByTagName("em")(0).innerText 'Find and Get Table Data tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4") tblStartRow = 5 Set elemCollection = .Document.getElementsByTagName("TABLE") For t = 0 To elemCollection.Length - 1 For r = 0 To (elemCollection(t).Rows.Length - 1) For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1) ActiveSheet.Cells(r + tblStartRow, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText Next c Next r ActiveSheet.Cells(r + tblStartRow + 2, 1) = tblNameArr(t) tblStartRow = tblStartRow + r + 4 Next t End With ' cleaning up memory IE.Quit Next x End Sub 

以下是html代码:

 <!--重要财务指标 start--> <a name="a1" id="a1"></a> <div class="part02"> <div class="sub01"> <div class="sub01_tt fblue"> <span class=" selected"><a href="#a1" target="_self">重要财务指标</a></span> <span class=""><a href="#a2" target="_self">资产负债表</a></span> <span class=""><a href="#a3" target="_self">现金stream量表</a></span> <span class=""><a href="#a4" target="_self">综合损益表</a></span> <em class="rt">报表types:<select class="fgrey" style="width:100px;" interface="getFinanceStandardForjs?symbol=$symbol&financeStanderd=" table="tableGetFinanceStandard" onchange="selectData(this);"> <option value="all" >全部</option> <option value="zero" >年报</option> <option value="1" >中报</option> <option value="2" >一季报</option> <option value="3" >三季报</option> </select></em> </div> 

由标签<a name="a1" id="a1"></a>创build的元素为空。 这只是一个链接锚点。 它不包含的东西。 所以得到这个元素是没用的。

一种方法可能是,运行所有的A元素,并select引用href="#a1"href="#a2" ,…

例:

 ... nameBalanceSheet = "Balance Sheet" nameCashFlow = "Cash Flow" nameHeader3 = "Header 3" nameHeader4 = "Header 4" Set elemCollection = .Document.getElementsByTagName("A") For i = 0 To elemCollection.Length - 1 If Right(elemCollection(i).href, 3) = "#a1" Then nameBalanceSheet = elemCollection(i).innerText ElseIf Right(elemCollection(i).href, 3) = "#a2" Then nameCashFlow = elemCollection(i).innerText ElseIf Right(elemCollection(i).href, 3) = "#a3" Then nameHeader3 = elemCollection(i).innerText ElseIf Right(elemCollection(i).href, 3) = "#a4" Then nameHeader4 = elemCollection(i).innerText End If Next tblNameArr = Array(nameBalanceSheet, nameCashFlow, nameHeader3, nameHeader4) ...