Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WebClient DownloadString UTF-8 not displaying international characters

I attempt to save the html of a website in a string. The website has international characters (ę, ś, ć, ...) and they are not being saved to the string even though I set the encoding to be UTF-8 which corresponds to the websites charset.

Here is my code:

using (WebClient client = new WebClient())
{
    client.Encoding = Encoding.UTF8;
    string htmlCode = client.DownloadString("http://www.filmweb.pl/Mroczne.Widmo");
}

When I print "htmlCode" to the console, the international characters are not shown correctly even though in the original HTML they are shown correctly.

Any help is appreciated.

like image 316
mrybak3 Avatar asked May 13 '16 02:05

mrybak3


1 Answers

I had the same problem. It seems that client.DownloadString doesn’t encode the characters using UTF-8. Using client.DownloadData and encoding the returned data with Encoding.UTF8.GetString solve the problem.

using (WebClient client = new WebClient())
{
     var htmlData = client.DownloadData("http://www.filmweb.pl/Mroczne.Widmo");
     var htmlCode = Encoding.UTF8.GetString(htmlData);
}
like image 93
Abbas Amiri Avatar answered Sep 22 '22 04:09

Abbas Amiri