Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting HtmlDocument from string without using browser control

Tags:

browser

dom

c#

I obtain a webpage's html code (as a string) using a WebClient.

However I want to turn it into an HtmlDocument object so I can use the DOM features this class offers. Currently the only way I know how to do it - is using a Browser control as follows:

            string pageHtml = client.DownloadString(url);

            browser.ScriptErrorsSuppressed = true;

            browser.DocumentText = pageHtml;

            do
            {
                Application.DoEvents();

            } while (browser.ReadyState != WebBrowserReadyState.Complete);

            return browser.Document;

Is there another way of doing it? I know there are other browser controls avaliable, but is there a simpler way?

like image 413
Aabela Avatar asked May 05 '26 22:05

Aabela


2 Answers

You can use HtmlAgilityPack .... For example:

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var results = doc.DocumentNode
    .Descendants("div")
    .Select(n => n.InnerText);
like image 178
L.B Avatar answered May 08 '26 11:05

L.B


I know this is an old post but my repl is for others who come here like me

If you want to do it using code .NET here is what you have to do

public System.Windows.Forms.HtmlDocument GetHtmlDocument(string html)
        {
            WebBrowser browser = new WebBrowser();
            browser.ScriptErrorsSuppressed = true;
            browser.DocumentText = html;
            browser.Document.OpenNew(true);
            browser.Document.Write(html);
            browser.Refresh();
            return browser.Document;
        }
like image 25
Nikhil Gaur Avatar answered May 08 '26 11:05

Nikhil Gaur



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!