Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting up headless Firefox with MozRepl

I would like to run a crawler that can handle javascript created html in an environment without an X server. I know I can run Firefox in a headless state under xvfb, and I know how to install MozRepl on Firefox and interact with it using WWW::Mechanize when I have the actual browser and can download and setup the module.

What I don't know how to do is setup MozRepl on Firefox in an environment where I don't have an X server to make it easy for me to install the module. Any help is appreciated.

like image 247
Vijay Boyapati Avatar asked Oct 24 '11 06:10

Vijay Boyapati


1 Answers

There are a number of options for headless html+javascript (thanks primarily to google's new toy Node.js used in the Chrome browser) depending on the language that you want to use, but unfortunately none that I know of are firefox based -- there was crowbar, but it appears un-updated since 2008.

Basing such software on firefox has become less feasable now that firefox has begun integrating gecko more tightly with the browser front-end.

Regarding node.js, I don't know much about the Perl offerings, but here are some of the others:

  • zombie (javascript)
  • mink (PHP 5.3) (uses zombie as a back-end)

And then there are a few non-node options as well:

  • phantomjs (javascript) (uses a webkit back-end, which may need X installed)
  • htmlunit (java)
  • akephalos (ruby) (uses an htmlunit back-end)

I believe there's also a python interface to node.js (though if it implements a browser environment, I don't know), and there is likely work going on in perl space as well with node.

like image 50
David Avatar answered Sep 28 '22 01:09

David