Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlUnit slower than GUI browser?

Why HtmlUnit is so much slower than GUI browsers? For instance, HtmlUnit loads this page http://oltexpress.airkiosk.com/cgi-bin/airkiosk/I7/181002i?O2=2 in 14sec (when CSS support is turned off) while FF does it in 5sec (after clearing cache, with CSS support). I know, modern browsers are not so restrictive dealing with bad JS code while HtmlUnit is, but still the time diffrence here is intolerable.

Any ideas how to speed up work with HtmlUnit? Has anyone played with HtmlUnit cache?

like image 781
biera Avatar asked May 21 '12 19:05

biera


1 Answers

To answer your question on why is it slow:

This is purely because HTMLUnit has many things going against it:

  • It is running in a compiled language which does not have many of the native optimisations of browsers such as FireFox.
  • It requires well formed XML as opposed to HTML(non-strict) which means that it has to convert the HTML into XML.
  • Then it has to run the JavaScript through a parser, fix any problems with the code, then process that inside Java itself.
  • Also as @Arya pointed out, it requests things one at a time, so many javascript files will result in a slow down, many images will result in a slow down.

To answer your question on how to speed it up:

As a general rule I disable(unless they are needed):

  • JavaScript
  • Images
  • CSS
  • Applets.

I also got the source code and removed the ActiveX support and re-compiled. If you want to prevent the code from loading those extra pages you can use the code below to give a response without downloading it from the web.

WebClient browser;
browser.setWebConnection(new WebConnectionWrapper(browser) {
    @Override
    public WebResponse getResponse(final WebRequest request) throws IOException {
        if (/* Perform a test here */) {
            return super.getResponse(request); // Pass the responsibility up.
        } else {
            /* Give the program a response, but leave it empty. */
            return new StringWebResponse("", request.getUrl());
        }
    }
});

Other things I have noticed:

  • HTMLUnit is not thread safe meaning that you should probably create a new one for each thread.
  • HTMLUnit does actually cache the pages
like image 120
Opal Avatar answered Sep 21 '22 22:09

Opal