Why is mechanize throwing a HTTP 403 error?

For some reason I get a HTTP Error 403: Forbidden when I try opening the page http://questionablecontent.net. I used to get a robots.txt error, but that has been solved. Additionally, I can't even find their robots.txt file.

I can still view the webpage from chrome, so what I'm wondering is: does mechanize look differently than chrome even after setting the appropriate headers?

Here is my code (which does not work):

br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv: Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]   

I also tried setting the addheaders to the same headers as my browser (which I found here):

br.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')]

... but that didn't work either.

Finally, I tried using Selenium and that worked, seeing as it loads the page in chrome and then communicates with Python. However, I still would like to get it working with mechanize. Also, I'm still unsure as to how chrome and mechanize look different to their server.

1 Answers

The trick is probably in the request headers selenium is sending, apart from the user agent header, some servers check other headers as well to ensure a real browser is talking to them. look at one of my older answers:

urllib2.HTTPError: HTTP Error 403: Forbidden

In your place, I would try adding all the headers your real chrome browser sends, and then eliminate the unnecessary ones.

