Excel:从HTML标题查询属性
我想使用excel vba从网页中的标题types中提取属性值。 我想从网页上刮取的数据具有以下结构:
<div class="index-detail"> <h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentIdentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contentType="web-page" contentTitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5> <span class="return-value">917.08 </span> <span class="daily-change down ">-0.1% ▼ </span> </div>
使用getElementsByClassName
和getElementsByTagName
我已经提取了标题<h5>
,但是当我打印标题的innerText
时,我得到了DJSI Chile
,但是我想要得到Dow Jones Sustainability™ Chile Index (CLP)
属性contentTitle
的文本Dow Jones Sustainability™ Chile Index (CLP)
。
我怎样才能做到这一点?
UPDATE
代码是我使用如下:
Sub myConSP() ' Declare variables Dim oHtmlSP As HTMLDocument Dim tSPIndex As HTMLDivElement Dim tSPIdx As HTMLDivElement ' Load page inside HTMLDocument Set oHtmlSP = New HTMLDocument With CreateObject("WINHTTP.WinHTTPRequest.5.1") .Open "GET", "http://www.espanol.spindices.com", False .send oHtmlSP.body.innerHTML = .responseText End With ' Get indices Set tSPIndex = oHtmlSP.getElementById("all-indices-slider") Set objTitleTag = tSPIndex.getElementsByClassName("index-detail")(0).getElementsByTagName("h5")(0) MsgBox objTitleTag.getAttribute("contentTitle").innerText End Sub
该属性附加到<a>
,而不是<h5>
(对不起,这是我在上面的评论中的错误):
Sub TT() Dim html As String, d As New HTMLDocument, el html = "<div class='index-detail'>" & _ "<h5><a href='/indices/equity/dow-jones-sustainability-chile-index-clp' " & _ "title='DJSI Chile' contentIdentifier='2e9cb165-0cbf-4070-a5ef-dc20bf6219ba' " & _ "contentType = 'web-page' " & _ "contentTitle='Dow Jones Sustainability™ Chile Index (CLP)'>DJSI Chile</a></h5> " & _ "<span class='return-value'>917.08 </span> " & _ "<span class='daily-change down '>-0.1% ? </span></div>" d.body.innerHTML = html Set el = d.getElementsByClassName("index-detail")(0).getElementsByTagName("a")(0) Debug.Print el.getAttribute("contentTitle") ' >>> Dow Jones Sustainability™ Chile Index (CLP) End Sub