Excel亚马逊卖家Web Scraper问题

我一直在试图让这个代码工作效率的工作stream程,但我似乎无法使其正常工作。

步骤:1.login到亚马逊卖家

  1. 使用列A中的订单号码,并将其放在search框中进行search
  2. search“Estimated Delivery:”的元素innerText,并将信息刮到与订单号相邻的列B中
  3. 移至下一个订单编号并重复处理,直至订单编号栏为空。

网页代码(我试图获得的是突出显示的):

在这里输入图像说明

Option Explicit Dim HTMLDoc As HTMLDocument Dim MyBrowser As InternetExplorer Sub MyAmazonSellereEDD() Dim MyHTML_Element As IHTMLElement Dim MyURL As String Dim oSignInLink As HTMLLinkElement Dim oInputEmail As HTMLInputElement Dim oInputPassword As HTMLInputElement Dim oInputSignInButton As HTMLInputButtonElement 'InputSearchOrder will be the destination for order numbers taken from the workbook Dim InputSearchOrder As HTMLInputElement Dim InputSearchButton As HTMLInputButtonElement Dim IE As InternetExplorer Dim AAOrder As Workbook Dim AAws As Worksheet Dim AAws2 As Worksheet Dim R As Range Dim x As Integer Dim i As Long Dim ar As Variant Dim elems As IHTMLElementCollection Dim TDelement As HTMLTableCell Dim ExcludWords() As Variant, a As Range, b As Long, LR As Long ExcludWords = Array("Estimated Delivery:") MyURL = "https://sellercentral.amazon.com/gp/homepage.html" Set IE = New InternetExplorer ' Open the browser and navigate. With IE .Silent = True .navigate MyURL .Visible = True Do DoEvents Loop Until .readyState = READYSTATE_COMPLETE End With ' Get the html document. Set HTMLDoc = IE.document With HTMLDoc .all.Item("username").Value = "blankityblank@blank.net" .all.Item("password").Value = "*********" .all.Item("sign-in-button").Click End With Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE Application.Wait (Now + TimeValue("0:00:08")) 'Set AAOrder = Application.Workbooks.Open("Z:\Automation Anywhere\5 Automated Tracking Imports\Amazon Prime\PrimeOrdersWithNoFulfillment.csv") 'Set AAws = AAOrder.Worksheets("PrimeOrdersWithNoFulfillment") x = 2 'Do Until Range("A" & x) = "" If Range("B" & x).Value = "" Then 'If AAws.Range("B" & x).Value = "" Then 'x = x + 1 Do Until Range("A" & x) = "" Set InputSearchOrder = HTMLDoc.getElementById("sc-search-field") InputSearchOrder.Value = Range("A" & x) Set InputSearchButton = HTMLDoc.getElementsByClassName("sc-search-button")(0) InputSearchButton.Click Do DoEvents Loop Until IE.readyState = READYSTATE_COMPLETE Application.Wait (Now + TimeValue("0:00:05")) Set elems = HTMLDoc.getElementsByTagName("td") 'ExcludWords = Array("Package Weight:", "Tracking ID:", "Ship Date:", "Carrier:", "Shipping Service:") i = 2 For Each TDelement In elems If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then Range("B" & x).Value = TDelement.innerText i = i + 1 End If Next LR = Range("B" & Rows.Count).End(xlUp).Row For i = 1 To LR Set a = Cells(i, "B") For b = 0 To UBound(ExcludWords) a.Formula = Replace((a.Formula), ExcludWords(b), "") Next b Next i 'End If x = x + 1 Loop 'Loop End If Err_Clear: If Err <> 0 Then Err.Clear Resume Next End If MsgBox ("Process is done! :)") End Sub 

我的问题是,当它刮擦数据时,“估计交付”字样和它应该刮除的实际估计交付date是分开的,但是仍然应该被包括在列B中的输出数据中。它所做的是发现只插入“Estimated Delivery:”,然后使用代码,按指示修剪这些字符。 之后,空间依然空白。 我不确定问题是什么。

您在下面的代码部分中select的TDelement在其innerText仅包含“Estimated Delivery:”,具有date的部分实际上是一个单独的TDelement

 For Each TDelement In elems If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then Range("B" & x).Value = TDelement.innerText i = i + 1 End If Next 

由于在html代码中没有任何唯一的信息(例如id,name等)用来引用包含date的TDelement ,所以你可以使用已经和NextSibling结合使用的引用,包含文本“预计交付:”。 也许试试这个(目前无法testing任何东西,但应该可以):

 For Each TDelement In elems If TDelement.className = "data-display-field" And InStr(TDelement.innerText, "Estimated Delivery:") Then Range("B" & x).value = TDelement.NextSibling.innerText i = i + 1 End If Next