Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

get WPF WebBrowser HTML

I'm using Wpf WebBrowser to access a certain page. I need to get it's HTML content- I can't use Webclient or WebReques etc. because I need to execute JS on that pages. I also tried Awesomium and Wf WebBrowser (both wrong).

    dynamic doc=browser.Document;
    var text=doc.InnerHtml//or something like this

Code above doesn't work for me, it shows nullreference. Can anybody tell me how to fetch it? I've been searching for this for weeks and didn't find anything really working :/ . Please answer like for a biggest dumbass you can imagine :D. It sometimes happens to me that people send me a piece of code and I have no idea how to use it... I mean please make your posts like ending with

     string HTML=some_stuff;

Or if you know about some alternative browser which is not buggy and where I can access HTML or something that would let me execute JS on loaded Html with affects like cookies and changes in HTML source that's also a really good answer. I'll be appreciative for any help.

like image 240
czubehead Avatar asked Aug 28 '14 20:08

czubehead


3 Answers

Yeeeaaaah! I did it. It's so simple:

    string HTML = (browser.Document as mshtml.IHTMLDocument2).body.outerHTML;
like image 86
czubehead Avatar answered Oct 07 '22 07:10

czubehead


I made something like this once. It was horrible, but it works.

You need to add a reference to Microsoft.mshtml.

Then you can use IHTMLDocument2. Why 2? Good question... anyway, I wrote a couple of helper functions like this:

public static void FillField(object doc, string id, string value)
{
    var element = findElementByID(doc, id);
    element.setAttribute("value", value);
}

public static void ClickButton(object doc, string id)
{
    var element = findElementByID(doc, id);
    element.click();
}

private static IHTMLElement findElementByID(object doc, string id)
{
    IHTMLDocument2 thisDoc;
    if (!(doc is IHTMLDocument2))
        return null;
    else
        thisDoc = (IHTMLDocument2)doc;

    var element = thisDoc.all.OfType<IHTMLElement>()
        .Where(n => n != null && n.id != null)
        .Where(e => e.id == id).First();
    return element;
}

Executing JS

private static void ExecuteScript(object doc, string js)
{
    IHTMLDocument2 thisDoc;
    if (!(doc is IHTMLDocument2))
        return;
    else
        thisDoc = (IHTMLDocument2)doc;
    thisDoc.parentWindow.execScript(js);
}

I call them like this...

HtmlDocumentHelper.FillField(webBrowser.Document, <id>, <value>);
HtmlDocumentHelper.FillField(webBrowser.Document, <id>, <value>);
HtmlDocumentHelper.ClickButton(webBrowser.Document, <id>);
HtmlDocumentHelper.ExecuteScript(webBrowser.Document, "alert(1);");
like image 22
Gray Avatar answered Oct 07 '22 07:10

Gray


When I tried @Gray or @czubehead's code body was always null. The following code, however, worked for me:

dynamic webBrowserDocument = webBrowser.Document;
string html = webBrowserDocument?.documentElement?.InnerHtml;

And make sure that this should go into LoadCompleted or later. When using this in Navigated the source is not complete or even null.

like image 26
Norman Avatar answered Oct 07 '22 07:10

Norman