I have an ASP.NET
page and some custom class that fetches a specified webpage and returns that page body back.
protected String GetHtml()
{
Thread thread = new Thread(new ThreadStart(GetHtmlWorker));
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
return docHtml;
}
protected void GetHtmlWorker()
{
using (WebBrowser browser = new WebBrowser())
{
browser.ScriptErrorsSuppressed = true;
browser.Navigate(_url);
// Wait for control to load page
while (browser.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
docHtml = browser.DocumentText;
}
}
But what I need is to get DOM HTML
instead of the page source because I do some extra operations over DOM
by jQuery
.
Here is one solution I found to get to the rendered HTML(DOM) after javascript was run:
Place a WebBrowser control named webBrowser1 on the Form of class Form1.
[Form1.cs[Design]]
Then for code use:
[Form1.cs]
using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;
namespace WebBrowserTest
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
this.webBrowser1.ObjectForScripting = new MyScript();
}
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Navigate("http://localhost:6489/Default.aspx");
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser1.Navigate("javascript: window.external.CallServerSideCode();");
}
[ComVisible(true)]
public class MyScript
{
public void CallServerSideCode()
{
var doc = ((Form1)Application.OpenForms[0]).webBrowser1.Document;
}
}
}
}
Change the webBrowser1.Navigate("http://localhost:6489/Default.aspx") parameter in Form1_Load to the page whose DOM after being processed by javascript you wish to obtain.
You can access the modified DOM in the CallServerSideCode() method, for example:
doc.GetElementById("myDataTable");
Or you can access the rendered HTML like this:
var renderedHtml = doc.GetElementsByTagName("HTML")[0].OuterHtml;
As George said in one of the comments, in theory you can just get the DOM in webBrowser1_DocumentCompleted by just using:
webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With