Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting HtmlDocument.DomDocument to string

Tags:

dom

c#

How to convert HtmlDocument.DomDocument to string?

like image 270
Hannoun Yassir Avatar asked Sep 03 '10 23:09

Hannoun Yassir


2 Answers

This example is a bit convoluted, but, assuming you have a form called Form1, with a WebBrowser control called webBrowser1, the variable content will contain the markup that forms the document:

private void Form1_Load(object sender, EventArgs e)
{
    webBrowser1.Url = new Uri(@"http://www.robertwray.co.uk/");          
}

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var document = webBrowser1.Document;
    var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument;

    var content = documentAsIHtmlDocument3.documentElement.innerHTML;
}

The essential "guts" of extracting it from the HtmlDocument.DomDocument is in the webBrowser1_DocumentCompleted event handler.

Note: mshtml is obtained by adding a COM reference to 'Microsoft HTML Object Library` (aka: mshtml.dll)

like image 115
Rob Avatar answered Sep 20 '22 06:09

Rob


It would be easier to use the HtmlDocument itself, rather than its DomDocument property:

string html = htmlDoc.Body.InnerHtml;

Or even simpler, if you have access to the WebBrowser containing the document:

string html = webBrowser.DocumentText;
like image 32
Thomas Levesque Avatar answered Sep 18 '22 06:09

Thomas Levesque