Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can requests python library force a page to load all javascript dynamic content before storing the contents of that page

Beautifulsoup can often be used to (1) store the contents of a page in a variable and (2) parse elements in a webpage.

However Beautifulsoup on it's own cannot open - password protected HTTP error 403 pages. So I used requests for this task.

Now I am wondering does the Requests library have the ability to Force the javascript on a page to load?

I am using python2.7

Does requests have the ability to requests.open(some url).forceJavascriptLoad

like image 746
yoshiserry Avatar asked Oct 21 '22 08:10

yoshiserry


1 Answers

No. Requests doesn't have the ability to execute javascript in any way. You need a so-called "headless" web browser to do what you want. Here is a list of some of them. As an advice I recommend you to try the PhantomJS, although it is not written in Python, it has several advantages over the others:

  1. It is easy to setup and use
  2. Actively developed and not abandoned like a lot of other headless browsers
  3. Has really good JavaScript support
  4. Is fast
  5. Provides precompiled binaries in case you have problems with compiling stuff

I tried a lot of headless browsers by myself and I was only happy with PhantomJS. If you still want to try the Python-based headless browser you can give a Ghost a try.

like image 144
Max Tepkeev Avatar answered Oct 23 '22 02:10

Max Tepkeev