使用VBA提取XML属性

我不是一个开发人员,XML知识非常有限,但是在过去的三四天里我学到了什么。 所以提前道歉这个问题的基本层面。 我正在试图完成这个一次性任务。

我有一些VBA Excel知识,目前我正在尝试使用VBA从SEC文件网站上的给定公司页面提取SIC代码属性。 作为一个例子,这是沃尔玛的网站

http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40&hidefilings=0

在顶部的蓝色栏中,您可以看到“SIC:5331”这是5331我试图返回到一个VBAvariables,所以我可以填充一个电子表格。 当我右键单击IE和clich查看源相关页面的部分读取XML为:

<div id="contentDiv"> <!-- START FILER DIV --> <div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;"> <div class="mailer">Mailing Address <span class="mailerAddress">702 SOUTHWEST 8TH STREET</span> <span class="mailerAddress"> BENTONVILLE AR 72716 </span> </div> <div class="mailer">Business Address <span class="mailerAddress">702 SOUTHWEST 8TH ST</span> <span class="mailerAddress">BENTONVILLE AR 72716 </span> <span class="mailerAddress">5012734000</span> </div> <div class="companyInfo"> <span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;CIK=0000104169&amp;owner=exclude&amp;count=40">0000104169 (see all company filings)</a></span> <p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;SIC=5331&amp;owner=exclude&amp;count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&amp;State=AR&amp;owner=exclude&amp;count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>. <br />Get <a href="/cgi-bin/own-disp?action=getowner&amp;CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>. </p> </div> </div> </div> 

在试图了解如何使用VBA提取SIC时,我在您的网站上find了以下文章:

使用VBA查询xml属性值并将其parsing为XLS

我尝试通过复制/粘贴到Excel模块应用barrowc的答案,并插入到沃尔玛备案的path,但是当我通过我得到Debug.Print“*****”,但我没有得到任何东西。文本。

 Sub test4() Dim d As MSXML2.DOMDocument60 Dim i As IXMLDOMNodeList Dim n As IXMLDOMNode Set d = New MSXML2.DOMDocument60 d.async = False d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany") Debug.Print "*****" Set i = d.SelectNodes("//div[@id='contentDiv']") For Each n In i Debug.Print n.Text Next n Debug.Print "*****" Set d = Nothing End Sub 

我在d.SelectNodes()试过了各种各样的string,但是我对这个话题还不够了解,不知道哪里出错了。 所以,无论是对我的语法的评论,还是对资源的指针,都会有很大的帮助。

如果您只是对SIC感兴趣,那么尝试parsing整个DOM结构是不值得花时间的。 相反,识别一组独特的字符,search并从那里提取SIC。

以下function就是这样做的。 你只需要传递它的页面的完整的HTML源代码,它将返回SIC:

 Function ExtractSIC(SourceHtml As String) As String Const PrefixChars As String = "&amp;SIC=" Const SuffixChars As String = "&" Dim StartPos As Long, EndPos As Long StartPos = InStr(SourceHtml, PrefixChars) If StartPos = 0 Then Exit Function StartPos = StartPos + Len(PrefixChars) EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1 ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1) End Function 

再次感谢mwolfe。 我已经在下面发布了我的代码,但是你提供的更优雅。 我知道SIC只有4位数,所以我很懒惰,在代码中做了一个假设,并且可能会在将来抛出错误。 你可以看到我在注释部分做了些什么。

 Sub GetSICs() Application.ScreenUpdating = False Dim AWBN As String Dim ASN As String Dim CIK As String Dim NUM_FILES_TO_GET As Long Dim COUNTER As Long Dim SICTagPos As Integer Dim SIC As String Set IEbrowser = CreateObject("InternetExplorer.application") IEbrowser.Visible = False AWBN = ActiveWorkbook.Name ASN = ActiveSheet.Name Workbooks(AWBN).Sheets(ASN).Range("A1").Select ActiveCell.Offset(0, 11) = "SIC" NUM_FILES_TO_GET = Application.WorksheetFunction.CountA(Range("A:A")) For COUNTER = 1 To 3 'NUM_FILES_TO_GET Application.StatusBar = "Counter = " & COUNTER 'SICTagPos = 0 CIK = ActiveCell.Offset(COUNTER, 2) IEbrowser.Navigate URL:="http://www.sec.gov/edgar/searchedgar/companysearch.html" Do DoEvents Loop Until IEbrowser.readyState = 4 Set frm = IEbrowser.Document.forms(0) frm("CIK").Value = CIK frm.submit While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend SIC = ExtractSIC(IEbrowser.Document.body.innerhtml) 'SICTagPos = InStr(1, IEbrowser.Document.body.innerhtml, "SIC=") 'SIC = Right(Left(IEbrowser.Document.body.innerhtml, SICTagPos + 7), 4) ActiveCell.Offset(COUNTER, 11).NumberFormat = "@" ActiveCell.Offset(COUNTER, 11) = SIC Next Application.StatusBar = False Application.ScreenUpdating = True End Sub Function ExtractSIC(SourceHtml As String) As String Const PrefixChars As String = "&amp;SIC=" Const SuffixChars As String = "&" Dim StartPos As Long, EndPos As Long StartPos = InStr(SourceHtml, PrefixChars) If StartPos = 0 Then Exit Function StartPos = StartPos + Len(PrefixChars) EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1 ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1) End Function