Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the final generated html source using c# or vb.net

Tags:

c#

vb.net

using VB.net or c#, How do I get the generated HTML source?

To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?

thanks

WebRequest req = WebRequest.Create("http://www.asp.net"); 
WebResponse res = req.GetResponse(); 
StreamReader sr = new StreamReader(res.GetResponseStream()); 
string html = sr.ReadToEnd();

if I try this below then it returns the document with out the JavaScript code injected

Public Class Form1

    Dim WB As WebBrowser = Nothing

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

        WB = New WebBrowser()
        Me.Controls.Add(WB)
        AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted


        WB.Navigate("mysite/Default.aspx")

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)


        'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
        Dim s As String = WB.DocumentText

    End Sub
End Class

HTML returned

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>

</head>
<body>
    <form id="form1" runat="server">
    <div id="center_text_panel">
    //test text  this text should be here
    </div>
    </form>
</body>
</html>

    <script type="text/javascript">

        document.getElementById("center_text_panel").innerText = "test text";


    </script>
like image 929
Hello-World Avatar asked Feb 13 '13 06:02

Hello-World


2 Answers

You can use WebKit.NET

Look here for official tutorials

This can not only grab the source, but also process javascript through the pageload event.

webKitBrowser1.Navigate(MyURL)

Then, handle the DocumentCompleted event, and:

private documentContent = webKitBrowser1.DocumentText

Edit - This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/

like image 193
Brian Webster Avatar answered Nov 06 '22 00:11

Brian Webster


Just put a webbrowser control to your form and you flowing code:

 webBrowser1.Navigate("YourLink");

     private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
           string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText
        }

Edited

for getting also html code that generated dynamically by java script code you have two way:

  1. run flowing code after webBrowser1_DocumentCompleted Event
 StringBuilder htmlcode = new StringBuilder();
            foreach (HtmlElement item in webBrowser1.Document.All)
            {
                htmlcode.Append( item.InnerHtml);
            }
  1. write a javascript code for returning document.documentElement.innerHTML and using InvolkeScript Function To Return Result:
   var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");
like image 22
KF2 Avatar answered Nov 06 '22 01:11

KF2