Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is mechanize throwing a HTTP 403 error?

For some reason I get a HTTP Error 403: Forbidden when I try opening the page http://questionablecontent.net. I used to get a robots.txt error, but that has been solved. Additionally, I can't even find their robots.txt file.

I can still view the webpage from chrome, so what I'm wondering is: does mechanize look differently than chrome even after setting the appropriate headers?

Here is my code (which does not work):

br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_redirect(True)
br.set_handle_robots(False)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]   

I also tried setting the addheaders to the same headers as my browser (which I found here):

br.addheaders = [('User-agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.72 Safari/537.36')]

... but that didn't work either.

Finally, I tried using Selenium and that worked, seeing as it loads the page in chrome and then communicates with Python. However, I still would like to get it working with mechanize. Also, I'm still unsure as to how chrome and mechanize look different to their server.

like image 923
Matthew Wesly Avatar asked Jul 30 '13 04:07

Matthew Wesly


People also ask

What does HTTP status code 403 mean?

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.

How do I fix failed to load the resource The server responded with a status 403 Forbidden?

Check the Requested URL The most common cause of a 403 Forbidden Error is simply inputting an incorrect URL. As discussed before, many tightly secured web servers disallow access to improper URLs. This could be anything from accessing a file directory to accessing a private page meant for other users.

When to use 403 Forbidden?

403 Forbidden is used when access to the resource is forbidden to everyone or restricted to a given network or allowed only over SSL, whatever as long as it is no related to HTTP authentication.

What is 403 error in postman?

403 Forbidden indicates Authentication was successful (otherwise would return 401 unauthorized ) but the authenticated user does not have access to the resource, e.g. they don't have the required roles or permissions.


1 Answers

The trick is probably in the request headers selenium is sending, apart from the user agent header, some servers check other headers as well to ensure a real browser is talking to them. look at one of my older answers:

urllib2.HTTPError: HTTP Error 403: Forbidden

In your place, I would try adding all the headers your real chrome browser sends, and then eliminate the unnecessary ones.

like image 171
andrean Avatar answered Oct 13 '22 12:10

andrean