从单元格string中删除HTML标签:excel公式

我有一个数据与Excel表格中的HTML标签,如下所示:

<b>This is test data<br>Nice <div> Go on this is next Cell Very goood <b>.....</b> 

所以,基本上我想删除或replace所有的HTML标签与Excel表中的空间。

Replace All<*>模式:

替换标签模式

要打开此function,请转到function区Home > Find & Select > Replace...或者直接按CTRL + H。

使用TRIMfunction可以进一步删除多余的空格。 祝你好运!

在Excel中打开VBA(Alt + F11),在右侧项目浏览器中点击项目名称(电子表格名称)。 插入 – >新模块。 将下面的用户定义函数粘贴到模块窗口中。 保存为允许macros的.XLSM。

假设您的数据在单元格A2中,请键入函数'= StripHTML(A2)'。 你也可以在这里下载一个工作示例:

http://jfrancisconsulting.com/how-to-strip-html-tags-in-excel/

 Function StripHTML(cell As Range) As String Dim RegEx As Object Set RegEx = CreateObject(“vbscript.regexp”) Dim sInput As String Dim sOut As String sInput = cell.Text sInput = Replace(sInput, “\x0D\x0A”, Chr(10)) sInput = Replace(sInput, “\x00″, Chr(10)) 'replace HTML breaks and end of paragraphs with line breaks sInput = Replace(sInput, “</P>”, Chr(10) & Chr(10)) sInput = Replace(sInput, “<BR>”, Chr(10)) 'replace bullets with dashes sInput = Replace(sInput, “<li>”, “-”) 'add back all of the special characters sInput = Replace(sInput, “&ndash;”, “–”) sInput = Replace(sInput, “&mdash;”, “—”) sInput = Replace(sInput, “&iexcl;”, “¡”) sInput = Replace(sInput, “&iquest;”, “¿”) sInput = Replace(sInput, “&quot;”, “”) sInput = Replace(sInput, “&ldquo;”, ““”) sInput = Replace(sInput, “&rdquo;”, “””) sInput = Replace(sInput, “”, “'”) sInput = Replace(sInput, “&lsquo;”, “'”) sInput = Replace(sInput, “&rsquo;”, “'”) sInput = Replace(sInput, “&laquo;”, “«”) sInput = Replace(sInput, “&raquo;”, “»”) sInput = Replace(sInput, “&nbsp;”, ” “) sInput = Replace(sInput, “&amp;”, “&”) sInput = Replace(sInput, “&cent;”, “¢”) sInput = Replace(sInput, “&copy;”, “©”) sInput = Replace(sInput, “&divide;”, “÷”) sInput = Replace(sInput, “&gt;”, “>”) sInput = Replace(sInput, “&lt;”, “<”) sInput = Replace(sInput, “&micro;”, “µ”) sInput = Replace(sInput, “&middot;”, “·”) sInput = Replace(sInput, “&para;”, “¶”) sInput = Replace(sInput, “&plusmn;”, “±”) sInput = Replace(sInput, “&euro;”, “€”) sInput = Replace(sInput, “&pound;”, “£”) sInput = Replace(sInput, “&reg;”, “®”) sInput = Replace(sInput, “&sect;”, “§”) sInput = Replace(sInput, “&trade;”, “™”) sInput = Replace(sInput, “&yen;”, “¥”) sInput = Replace(sInput, “&aacute;”, “á”) sInput = Replace(sInput, “&Aacute;”, “Á”) sInput = Replace(sInput, “&agrave;”, “à”) sInput = Replace(sInput, “&Agrave;”, “À”) sInput = Replace(sInput, “&acirc;”, “â”) sInput = Replace(sInput, “&Acirc;”, “”) sInput = Replace(sInput, “&aring;”, “å”) sInput = Replace(sInput, “&Aring;”, “Å”) sInput = Replace(sInput, “&atilde;”, “ã”) sInput = Replace(sInput, “&Atilde;”, “Ô) sInput = Replace(sInput, “&auml;”, “ä”) sInput = Replace(sInput, “&Auml;”, “Ä”) sInput = Replace(sInput, “&aelig;”, “æ”) sInput = Replace(sInput, “&AElig;”, “Æ”) sInput = Replace(sInput, “&ccedil;”, “ç”) sInput = Replace(sInput, “&Ccedil;”, “Ç”) sInput = Replace(sInput, “&eacute;”, “é”) sInput = Replace(sInput, “&Eacute;”, “É”) sInput = Replace(sInput, “&egrave;”, “è”) sInput = Replace(sInput, “&Egrave;”, “È”) sInput = Replace(sInput, “&ecirc;”, “ê”) sInput = Replace(sInput, “&Ecirc;”, “Ê”) sInput = Replace(sInput, “&euml;”, “ë”) sInput = Replace(sInput, “&Euml;”, “Ë”) sInput = Replace(sInput, “&iacute;”, “í”) sInput = Replace(sInput, “&Iacute;”, “Í”) sInput = Replace(sInput, “&igrave;”, “ì”) sInput = Replace(sInput, “&Igrave;”, “Ì”) sInput = Replace(sInput, “&icirc;”, “î”) sInput = Replace(sInput, “&Icirc;”, “Δ) sInput = Replace(sInput, “&iuml;”, “ï”) sInput = Replace(sInput, “&Iuml;”, “Ï”) sInput = Replace(sInput, “&ntilde;”, “ñ”) sInput = Replace(sInput, “&Ntilde;”, “Ñ”) sInput = Replace(sInput, “&oacute;”, “ó”) sInput = Replace(sInput, “&Oacute;”, “Ó”) sInput = Replace(sInput, “&ograve;”, “ò”) sInput = Replace(sInput, “&Ograve;”, “Ò”) sInput = Replace(sInput, “&ocirc;”, “ô”) sInput = Replace(sInput, “&Ocirc;”, “Ô”) sInput = Replace(sInput, “&oslash;”, “ø”) sInput = Replace(sInput, “&Oslash;”, “Ø”) sInput = Replace(sInput, “&otilde;”, “õ”) sInput = Replace(sInput, “&Otilde;”, “Õ”) sInput = Replace(sInput, “&ouml;”, “ö”) sInput = Replace(sInput, “&Ouml;”, “Ö”) sInput = Replace(sInput, “&szlig;”, “ß”) sInput = Replace(sInput, “&uacute;”, “ú”) sInput = Replace(sInput, “&Uacute;”, “Ú”) sInput = Replace(sInput, “&ugrave;”, “ù”) sInput = Replace(sInput, “&Ugrave;”, “Ù”) sInput = Replace(sInput, “&ucirc;”, “û”) sInput = Replace(sInput, “&Ucirc;”, “Û”) sInput = Replace(sInput, “&uuml;”, “ü”) sInput = Replace(sInput, “&Uuml;”, “Ü”) sInput = Replace(sInput, “&yuml;”, “ÿ”) sInput = Replace(sInput, “”, “´”) sInput = Replace(sInput, “”, “`”) 'replace all the remaining HTML Tags With RegEx .Global = True .IgnoreCase = True .MultiLine = True .Pattern = “<[^>]+>” 'Regular Expression for HTML Tags. End With sOut = RegEx.Replace(sInput, “”) StripHTML = sOut Set RegEx = Nothing End Function 

由于上面的macros没有为我工作,我自己修复它。 这是我的第一个脚本,如果你们可以改进它,让它更快,添加更多,那么你更欢迎!

好吧,我以前没有经验的编程(除了一些非常基本的Java 6年前),但有一些帮助,很多猜测(实际上是几个小时),我设法做出这个脚本,它的作品像一个魅力, 8#文本,但不能用换行符代替<BR> (你可以通过按CTRL + H来做到这一点,“find:”“”replace:(现在按住ALT键并在NUMPAD中使用0010型)应该在replace窗口中闪烁,然后点击“全部replace”)。

将下面的代码粘贴到用户模块中(alt + f11,右键单击Sheet1-> insert-> Module-> paste代码)

通过文件 – >选项 – >自定义function区 – >检查开发人员checkbox。 然后转到开发工具栏 – >插入 – >button – >然后放置button并右键单击 – >分配macros – >select删除标签。

 Sub RemoveTags() Dim r As Range Selection.NumberFormat = "@" 'set cells to text numberformat With CreateObject("vbscript.regexp") .Pattern = "\<.*?\>" .Global = True For Each r In Selection r.Value = Replace(.Replace(r.Value, ""), "&#8217;", " ") r.Value2 = Replace(.Replace(r.Value2, ""), "&#8211;", " ") Next r For Each r In Selection r.Value = Replace(.Replace(r.Value, ""), "&#8216;", " ") r.Value2 = Replace(.Replace(r.Value2, ""), "&#8232;", " ") Next r For Each r In Selection r.Value = Replace(.Replace(r.Value, ""), "&#8233;", " ") r.Value2 = Replace(.Replace(r.Value2, ""), "&#146;s", " ") Next r End With End Sub Private Sub CommandButton1_Click() End Sub