Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get Html created by javascript using HtmlUnit in Java and then parse it with Jsoup?

I am trying to access some content on a web page that is created by some Javascript. However, the content that I wish to access is created by the javascript after the page has loaded so this chunk of Html source is no where to be found when I try and parse it with Jsoup.

My code for getting the Html source, using HtmlUnit is as follows:

public static void main(String[] args) throws IOException {
           java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); 

    WebClient webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);

    String url = "myUrl.com";
    out.println("accessing " + url);

    HtmlPage page = webClient.getPage(url);

    out.println("waiting for js");
    webClient.waitForBackgroundJavaScriptStartingBefore(200);
    webClient.waitForBackgroundJavaScript(20000);

    out.println(page.asXml());

    webClient.close();
}

But when I run it, the Html that is supposed to be created is not printed. I was wondering how do I get this Html source, created by the Javascript, using HtmlUnit and then getting said result and passing it to Jsoup for parsing?

like image 941
THow Avatar asked Nov 09 '22 19:11

THow


1 Answers

Jsoup is server side processing framework,
I am not sure what is your final goal, I assume you want to use it in the same page so I will go with Ajax so you can do:

  • On document ready, capture the document dom
  • Send it for processing on server side
  • Display the results on the same page

Something like:

.

$( document ).ready(function() {
    var allClientSideHtml = $("html").html();

var dataToSend = JSON.stringify({'htmlSendToSever':allClientSideHtml });
 $.ajax({ url: "your_Jsoup_server_url.jsp_or_php/YourJsoupParser",
        type: "POST",
        contentType: "application/json; charset=utf-8",
        dataType: "json",
        data: dataToSend , // pass that text to the server as a JSON String
        success: function (msg) { alert(msg.d); },
        error: function (type) { alert("ERROR!!" + type.responseText); }

    });

});
like image 172
JavaSheriff Avatar answered Nov 14 '22 23:11

JavaSheriff