For some reason I get a HTTP Error 403: Forbidden
when I try opening the page http://questionablecontent.net
. I used to get a robots.txt
error, but that has been solved. Additionally, I can't even find their robots.txt file.
I can still view the webpage from chrome, so what I'm wondering is: does mechanize look differently than chrome even after setting the appropriate headers?
Here is my code (which does not work):
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
I also tried setting the addheaders to the same headers as my browser (which I found here):
br.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')]
... but that didn't work either.
Finally, I tried using Selenium and that worked, seeing as it loads the page in chrome and then communicates with Python. However, I still would like to get it working with mechanize. Also, I'm still unsure as to how chrome and mechanize look different to their server.
The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.
Check the Requested URL The most common cause of a 403 Forbidden Error is simply inputting an incorrect URL. As discussed before, many tightly secured web servers disallow access to improper URLs. This could be anything from accessing a file directory to accessing a private page meant for other users.
403 Forbidden is used when access to the resource is forbidden to everyone or restricted to a given network or allowed only over SSL, whatever as long as it is no related to HTTP authentication.
403 Forbidden indicates Authentication was successful (otherwise would return 401 unauthorized ) but the authenticated user does not have access to the resource, e.g. they don't have the required roles or permissions.
The trick is probably in the request headers selenium is sending, apart from the user agent header, some servers check other headers as well to ensure a real browser is talking to them. look at one of my older answers:
urllib2.HTTPError: HTTP Error 403: Forbidden
In your place, I would try adding all the headers your real chrome browser sends, and then eliminate the unnecessary ones.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With