Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Headless Browser for Python (Javascript support REQUIRED!) [closed]

I need a headless browser which is fairly easy to use (I am still fairly new to Python and programming in general) which will allow me to navigate to a page, log into a form that requires Javascript, and then scrape the resulting web page by searching for results matching certain criteria, clicking check boxes, and clicking to download files. All of this requires Javascript.

I hear a headless browser is what I want - requirements/preferences are that I be able to run it from Python, and preferably that the resultant script will be compilable by py2exe (I am writing this program for other users).

So far Windmill looks like it MIGHT be what I want, but I am not sure.

Any ideas appreciated!

like image 581
Steven Matthews Avatar asked May 17 '11 00:05

Steven Matthews


People also ask

Which browser is a headless browser?

Headless Chrome is essentially the Google Chrome web browser without its graphical user interface (GUI), based on the same underlying technology. Headless Chrome is instead controlled by scripts written by software developers.

What is headless JavaScript?

A headless browser is a web browser without a graphical user interface. Headless browsers provide automated control of a web page in an environment similar to popular web browsers, but they are executed via a command-line interface or using network communication.


1 Answers

I use webkit as a headless browser in Python via pyqt / pyside:
http://www.riverbankcomputing.co.uk/software/pyqt/download
http://developer.qt.nokia.com/wiki/Category:LanguageBindings::PySide::Downloads

I particularly like webkit because it is simple to setup. For Ubuntu you just use: sudo apt-get install python-qt4

Here is an example script:
http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

like image 61
hoju Avatar answered Sep 23 '22 11:09

hoju