Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# HtmlAgilityPack HtmlDocument() LoadHtml encoding

Tags:

c#

encoding

Uri url = new Uri("http://localhost/rgm.php");
WebClient client = new WebClient();
string html = client.DownloadString(url);

HtmlAgilityPack.HtmlDocument doc23 = new HtmlAgilityPack.HtmlDocument();
doc23.LoadHtml(html);

HtmlNode body23 = doc23.DocumentNode.SelectSingleNode("//body");

string content23 = body23.InnerHtml;

How can i force this to parse web page with "UTF-8" encoding?

like image 902
milesh Avatar asked Aug 16 '13 09:08

milesh


2 Answers

Use DownloadData method of WebClient instead of DownloadString():

WebClient client = new WebClient();
var data = client.DownloadData(url);
var html = Encoding.UTF8.GetString(data);
like image 199
I4V Avatar answered Sep 24 '22 02:09

I4V


Use MemoryStream

WebClient client = new WebClient(); 
MemoryStream ms = new MemoryStream(client.DownloadData("http://localhost/rgm.php"));

HtmlDocument doc23 = new HtmlDocument();
doc23.Load(ms, Encoding.UTF8);

HtmlNode body23 = doc23.DocumentNode.SelectSingleNode("//body");
string content23 = body23.InnerHtml;
like image 42
user6091703 Avatar answered Sep 26 '22 02:09

user6091703