使用VBA从Google Translate中提取div的内容

我有一个翻译语言的function:

Public Function Translate(rng As Range, Optional translateFrom As String = "nl", Optional translateTo As String = "en") Dim getParam As String, Trans As String, objHTTP As Object, URL As String Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP") getParam = ConvertToGet(rng.Value) URL = "https://translate.google.com/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam objHTTP.Open "GET", URL, False objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" objHTTP.send ("") If InStr(objHTTP.responseText, "div dir=""ltr""") > 0 Then Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>") Translate = CleanA(Trans) Else Translate = CVErr(xlErrValue) End If End Function 

有中文翻译:

 A1 = Hello B1 = Translate(A1,"en","zh-cn") 

结果是“Nǐhǎo”,正确的结果是“你好”

链接谷歌: https : //translate.google.com/m?hl= zh-CN & sl = en & tl = zh-CN & ie =UTF-8& prev = _m & q =hello

我想要结果:

 B1 = 你好C1 = Nǐ hǎo 

我想我需要修复这个代码:

 Trans = RegexExecute(objHTTP.responseText, "div[^""]*?""ltr"".*?>(.+?)</div>") 

请帮助我,谢谢!

该页面返回两个<div> s:

 <div dir="ltr" class="o1">Nǐ hǎo</div> <div dir="ltr" class="t0">你好</div> 

build议您不要尝试使用正则expression式来parsingHTML,因为在parsingHTML时遇到困难很容易 – 您可以使用VBA中的Microsoft HTML对象库来获得类似的结果。

要获得这两个<div>标签的内容,您可以按照以下示例使用此代码:

 ' o1 has Anglicised translation, t0 as tranlsation in target language Set objDivs = objHTML.getElementsByTagName("div") For Each objDiv In objDivs If objDiv.className = "o1" Then strTranslatedO1 = objDiv.innerText End If If objDiv.className = "t0" Then strTranslatedT0 = objDiv.innerText End If Next objDiv 

基本上循环返回的HTML中的所有<div>标记,并检查类名称o1t0 ,然后获取innerText属性。 使用这种技术,您可以获得翻译的值,并将其写回工作表,例如:

在这里输入图像描述

完整代码:

 Option Explicit Public Sub Test() Dim ws As Worksheet ' testing worksheet Set ws = ThisWorkbook.Worksheets("Sheet1") ws.Cells.Delete ' test inputs ws.Range("A1:E1") = Array("Input", "From", "To", "T0", "O1") ws.Range("A2:A4") = "hello" ws.Range("B2:B4") = "English" ws.Range("C2:C4") = Application.Transpose(Array("Chinese", "Spanish", "Russian")) ' test ws.Range("D2") = Translate("hello", "en", "zh-cn", True) ws.Range("E2") = Translate("hello", "en", "zh-cn", False) ws.Range("D3") = Translate("hello", "en", "es", True) ws.Range("E3") = Translate("hello", "en", "es", False) 'Spanish uses latin alphabet ws.Range("D4") = Translate("hello", "en", "ru", True) ws.Range("E4") = Translate("hello", "en", "ru", False) End Sub Public Function Translate(strInput As String, strFromLanguageCode As String, strToLanguageCode As String, blnTargetAlphabet As Boolean) As String Dim strURL As String Dim objHTTP As Object Dim objHTML As Object Dim objDivs As Object, objDiv Dim strTranslatedT0 As String Dim strTranslatedO1 As String ' send query to web page strURL = "https://translate.google.com/m?hl=" & strFromLanguageCode & _ "&sl=" & strFromLanguageCode & _ "&tl=" & strToLanguageCode & _ "&ie=UTF-8&prev=_m&q=" & strInput Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP") objHTTP.Open "GET", strURL, False objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" objHTTP.send "" ' create a html document Set objHTML = CreateObject("htmlfile") With objHTML .Open .Write objHTTP.responseText .Close End With ' o1 has Anglicised translation, t0 as tranlsation in target language Set objDivs = objHTML.getElementsByTagName("div") For Each objDiv In objDivs If objDiv.className = "o1" Then strTranslatedO1 = objDiv.innerText End If If objDiv.className = "t0" Then strTranslatedT0 = objDiv.innerText End If Next objDiv ' choose which to return If blnTargetAlphabet Then Translate = strTranslatedT0 Else Translate = strTranslatedO1 End If CleanUp: Set objHTML = Nothing Set objHTTP = Nothing End Function