I'm fetching the html document by URL using WebClient.DownloadString(url)
but then its very hard to find the element content that I'm looking for. Whilst reading around I've spotted HtmlDocument
and that it has neat things like GetElementById
. How can I populate an HtmlDocument
with the html returned by url
?
It's a text document saved with the extension . html or . htm that contains texts and some tags written between "< >" which give the instructions needed to configure the web page. These tags are fixed and definite and will be currently explained in the tutorials when applied and needed.
Html Agility Pack (HAP) is a free and open-source HTML parser written in C# to read/write DOM and supports plain XPATH or XSLT. It is a . NET code library that allows you to parse "out of the web" HTML files.
The HtmlDocument
class is a wrapper around the native IHtmlDocument2
COM interface.
You cannot easily create it from a string.
You should use the HTML Agility Pack.
Using Html Agility Pack as suggested by SLaks, this becomes very easy:
string html = webClient.DownloadString(url);
var doc = new HtmlDocument();
doc.LoadHtml(html);
HtmlNode specificNode = doc.GetElementById("nodeId");
HtmlNodeCollection nodesMatchingXPath = doc.DocumentNode.SelectNodes("x/path/nodes");
To answer the original question:
HTMLDocument doc = new HTMLDocument();
IHTMLDocument2 doc2 = (IHTMLDocument2)doc;
doc2.write(fileText);
// now use doc
Then to convert back to a string:
doc.documentElement.outerHTML;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With