Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatic Python Browser with JavaScript

I want to screen-scrape a web-site that uses JavaScript.

There is mechanize, the programmatic web browser for Python. However, it (understandably) doesn't interpret javascript. Is there any programmatic browser for Python which does? If not, is there any JavaScript implementation in Python that I could use to attempt to create one?

like image 814
Claudiu Avatar asked Dec 16 '09 18:12

Claudiu


2 Answers

You might be better off using a tool like Selenium to automate the scraping using a web browser, so the JS executes and the page renders just like it would for a real user.

like image 64
Annie Avatar answered Sep 30 '22 03:09

Annie


The PyV8 package nicely wraps Google's V8 Javascript engine for Python. It's particularly nice because not only can you call from Python to Javascript code, but you can call back from Javascript to Python code. This makes it quite straightforward to implement the usual browser-supplied objects (that is, everything in the Javascript global namespace: "window", "document", and so on), which you'd need to do if you were going to make a Javascript-capable Python browser emulator thing, possibly by hooking this up with mechanize.

like image 29
Peter Hansen Avatar answered Sep 30 '22 04:09

Peter Hansen