Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fake javascript enabled in Python requests/beautifulsoup

I'm trying to crawl a website which return an error message that your js is disabled and you might be a bot. I tried to see same behaviour in web browser and yes the same response, however if JavaScript is enabled it will not affect the original response, I mean original response is not dependent on JS.

So I was thinking if I can tell the web/http server that my JS is enabled and I'm not a BOT. is this possible in Python requests library, or any other python library for that matter?

And yeah I've set the User-Agent header, even all other headers, like host, language, connection, etc

like image 877
bakar Avatar asked Oct 04 '15 19:10

bakar


1 Answers

If the site is just checking whether javascript can be executed or not through executing some js, use selenium to get the page, and then use BeautifulSoup to parse the page that selenium got.

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://your-site/url')

html = driver.page_source
soup = BeautifulSoup(html)
...
like image 141
Flickerlight Avatar answered Nov 07 '22 23:11

Flickerlight