Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HtmlUnit + Selenium within Production

I am currently using HtmlUnit and Selenium to drive it (WebDriver) within my production code.

I am scaping and interacting with various websites programmatically with these libraries and am having some success and not experiencing memory issues (ensuring sessions are always cleaned up).

I am wondering if these libraries are okay for a production environment or recommended against. This is difficult to find via Google due to the enormous amount of information about automated testing rather than how I am using them.

I realise this is a fairly generic question, but I am seeking advice on these libraries and potentially better alternatives.

like image 281
Steven Avatar asked Jan 30 '12 05:01

Steven


3 Answers

WebDriver and Selenium are perfectly suited for production environnement. I use them quite extensively for 2 years now on a multi-machines/multi-datacenters distributed grid and had absolutely no performance nor stability problems we couldn't have coped with.

Our preferred driver is the Firefox one (heavier than HTMLUnit, and harder to configure), and we had to tweak the grid to understand how many instances we can run. Our maximum for stability was 1 per core

Our selenium/webdriver instances have run 24/7 for 2 years now (1 year with selenium 1, and the other migrating selenium 2/ WebDriver incrementally) and with an appropriate monitoring (you should monitor Memory Usage/CPU Usage) and a bunch of load testing, we had reached the good level where we have experienced several monthes without restarting a process

We've used HTMLUnit extensively too, and are equally satisfied with this library

The essential point of my post is : YES, these library are production-ready. But, as all production software, you'll have to benchmark their use to find the appropriate configuration for the optimal stability. I recommend you to use the Selenium Grid in production, which is a great way to parallelize process

like image 166
Grooveek Avatar answered Nov 09 '22 05:11

Grooveek


I'm using HtmlUnit for something similar in production and have had quite a bit of issues - mostly performance related. Currently I switched to snapshot version of HtmlUnit 2.10 where some important for me performance improvements were implemented (e.g. replacing ArrayList.contains() with HashSet.contains() on DomNode.addDomChangeListener()).

Still, the CPU load is quite high on JavaScript-heavy pages. Typically, I can't run more than 10 of them simultaneously on dual core Linux box. I believe HtmlUnit using Rhino (JavaScript engine) in interpreter mode only, which is pretty slow. Also, you need to be careful with releasing all resources used by HtmlUnit to avoid memory leaks.

All in all, it certainly noticeable that HtmlUnit was designed to run relatively short lived test cases and not long running server applications. It's possible to tweak it enough so it's manageable but certainly it could have been better.

Another approach I found promising is phantom-js, which is headless version of WebKit browser, native app which is much faster on running JavaScript.

like image 33
maximdim Avatar answered Nov 09 '22 05:11

maximdim


Generally, use your testing "gut feeling" about that. What WebDriver and HTMLUnit does is, that it simulates real user performing some actions in the webpage.

My personal gut feeling says, that I should do as less production testing, as possible. So I personally would use these tools only for verification, if my webapp is still alive.

Yes, its generic answer for generic question, but try this:

Gather around people responsible for the webapp and ask them:

  • Should be it tested on production? (so there is always slight chance, that some customers will see those test data)

  • If yes, what should be tested on production?

  • If yes, should it be automated?

And then you have answer ;)

like image 2
Pavel Janicek Avatar answered Nov 09 '22 05:11

Pavel Janicek