去除非法字符为Excel表

我写了一个程序来抓取网站获取数据和输出到Excel表。该程序是用C＃编写的，使用Microsoft Visual Studio 2010。

在大多数情况下，我从网站获取内容，parsing它，并将数据存储在Excel中没有任何问题。

然而，一旦我将会遇到问题，说有非法字符（如▶ ），防止输出到Excel文件，这会导致程序崩溃。我也手动到网站上，发现其他非法字符，如Ú 。

我试图做一个.Replace()但代码似乎无法find这些字符。

 string htmlContent = getResponse(url); //get full html from given url string newHtml = htmlContent.Replace("▶", "?").Replace("Ú", "?");

所以我的问题是，有没有办法从HTMLstring中去除这些types的所有字符？（网页的html）下面是我得到的错误消息。

我尝试了安东尼和woz的解决scheme，并没有工作…

在这里输入图像说明

请参阅System.Text.Encoding.Convert

用法示例：

 var htmlText = // get the text you're trying to convert. var convertedText = System.Text.Encoding.ASCII.GetString( System.Text.Encoding.Convert( System.Text.Encoding.Unicode, System.Text.Encoding.ASCII, System.Text.Encoding.Unicode.GetBytes(htmlText)));

我用string▶Hello Worldtesting了它，它给了我?Hello World 。

你可以尝试剥离所有非ASCII字符。

 string htmlContent = getResponse(url); string newHtml = Regex.Replace(htmlContent, @"[^\u0000-\u007F]", "?");

感谢您的答复，并感谢您的帮助。

经过几个小时的search后，我find了解决我的问题。问题是我不得不“净化”我的htmlstring。

http://seattlesoftware.wordpress.com/2008/09/11/hexadecimal-value-0-is-an-invalid-character/

以上是我find的有用的文章，它也提供了代码示例。

去除非法字符为Excel表

Excel滚动条用户窗体拖动时不会连续更新