Are there any libraries or frameworks that provide the functionality of a browser, but do not need to actually render physically onto the screen?
I want to automate navigation on web pages (Mechanize does this, for example), but I want the full browser experience, including Javascript. Thus, I'd like to have a virtual browser of some sort, that I can use to "click on links" programmatically, have DOM elements and JS scripts render within it, and manipulate these elements.
Solution preferably in Python, but I can manage others.
PhantomJS and PyPhantomJS are what I use for tasks like these.
What it is, is a headless WebKit based browser which is fully controllable via JavaScript. There's a C++ implementation (PhantomJS) and a Python one (PyPhantomJS). I prefer the Python one though, because it has a plugin system which allows you to add functionality to the core without actually modifying any code, unlike the C++ one. :)
There is an absolute ton of free software technology now available: take your pick at http://wiki.python.org/moin/WebBrowserProgramming but if you have specific questions join pyjamas-dev on google groups and i'll be happy to give further details, there. brief answer: you can run pywebkitgtk "headless", or you can use xulrunner (via python-hulahop) again using pygtk without actually doing "browserwidget.show()", and there's also pykhtml. also you could use python COM to connect to MSHTML.DLL.
these are all "cheat" methods: using python bindings to a graphical web browser engine without actually firing up the graphical bit. if you really wanted to put some serious hard-core programming in, you could create a "port" of webkit which was not connected to a GUI toolkit: as an experienced webkit programmer i'd put it as around... 2 weeks of full-time effort to make such a "headless" version of webkit.
l.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With