Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get rendered html (processed by Javascript) in WebBrowser control?

I have an ASP.NET page and some custom class that fetches a specified webpage and returns that page body back.

protected String GetHtml()
{
    Thread thread = new Thread(new ThreadStart(GetHtmlWorker));
    thread.SetApartmentState(ApartmentState.STA);
    thread.Start();
    thread.Join();
    return docHtml;
}

protected void GetHtmlWorker()
{
    using (WebBrowser browser = new WebBrowser())
    {
        browser.ScriptErrorsSuppressed = true;
        browser.Navigate(_url);
        // Wait for control to load page
        while (browser.ReadyState != WebBrowserReadyState.Complete)
            Application.DoEvents();
        docHtml = browser.DocumentText;
    }
}

But what I need is to get DOM HTML instead of the page source because I do some extra operations over DOM by jQuery.

like image 777
Denis Olifer Avatar asked Sep 07 '11 12:09

Denis Olifer


2 Answers

Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:

Place a WebBrowser control named webBrowser1 on the Form of class Form1.

[Form1.cs[Design]]

Then for code use:

[Form1.cs]

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;

namespace WebBrowserTest
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
            this.webBrowser1.ObjectForScripting = new MyScript();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            webBrowser1.Navigate("http://localhost:6489/Default.aspx");
        }

        private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
        }

        [ComVisible(true)]
        public class MyScript
        {
            public void CallServerSideCode()
            {
                var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
            }
        }
    }
}

Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.

You can access the modified DOM in the CallServerSideCode() method, for example:

doc.GetElementById("myDataTable");

Or you can access the rendered HTML like this:

var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
like image 132
Răzvan Flavius Panda Avatar answered Oct 05 '22 13:10

Răzvan Flavius Panda


As George said in one of the comments, in theory you can just get the DOM in webBrowser1_DocumentCompleted by just using:

webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
like image 29
2 revs Avatar answered Oct 05 '22 13:10

2 revs