Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I configure HTMLUnit to only run specific javascript processes and not the whole thing?

Tags:

java

htmlunit

I'm looking to gather information from a set of web pages that are all very similarly formatted. I need some information that is loaded onto the page by Javascript after opening. It seems that HTMLUnit is a pretty common tool to do this, so that's what I'm using. It's unfortunately very slow, which is a complaint I've seen across a lot of forums. The webClient.getPage() command is what is taking forever. When I turn off Javascript, it runs quickly, but I need to execute some Javascript commands. I was wondering, is there a way to selectively execute a few Javascript commands instead of all of them?

Alternatively, is there a program that is much faster than HTMLUnit for processing Javascript?

like image 811
Sam Bobel Avatar asked May 05 '14 21:05

Sam Bobel


1 Answers

Sort of. You can programatically decide which external JavaScript URLs to load:

HtmlUnit will run all JS embedded on the page, if JavaScript is enabled. However, if certain external URLs are not required, you can choose to not load them.

Here's some code to get your started:

    webClient.setWebConnection(new FalsifyingWebConnection(webClient) {
        @Override
        public WebResponse getResponse(WebRequest request) throws IOException {

            if(request.getUrl().getPath().toLowerCase().equals("some url i don't need ")) {
                return createWebResponse(request, "", "application/javascript");
            }

            return super.getResponse(request);
        }
    });

Setting the below might speed things up too:

    java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF); 

    webClient.setCssErrorHandler(new SilentCssErrorHandler());

    webClient.setIncorrectnessListener(new IncorrectnessListener() {
        @Override
        public void notify(String s, Object o) { }
    });

    webClient.getCookieManager().setCookiesEnabled(false);
    webClient.getOptions().setCssEnabled(false);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.getOptions().setPrintContentOnFailingStatusCode(false);
like image 199
Neil McGuigan Avatar answered Nov 14 '22 15:11

Neil McGuigan