Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3.4 urllib.request error (http 403)

People also ask

How do I fix HTTP error 403 in Python?

The easy way to resolve the error is by passing a valid user-agent as a header parameter, as shown below. Alternatively, you can even set a timeout if you are not getting the response from the website. Python will raise a socket exception if the website doesn't respond within the mentioned timeout period.

What is Urllib error in Python?

URLError – It is raised for the errors in URLs, or errors while fetching the URL due to connectivity, and has a 'reason' property that tells a user the reason of error. HTTPError – It is raised for the exotic HTTP errors, such as the authentication request errors. It is a subclass or URLError.

What is Urllib request in Python?

The urllib. request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. See also. The Requests package is recommended for a higher-level HTTP client interface.


It seems like the site does not like the user agent of Python 3.x.

Specifying User-Agent will solve your problem:

import urllib.request
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
html = urllib.request.urlopen(req).read()

NOTE Python 2.x urllib version also receives 403 status, but unlike Python 2.x urllib2 and Python 3.x urllib, it does not raise the exception.

You can confirm that by following code:

print(urllib.urlopen(url).getcode())  # => 403

Here are some notes I gathered on urllib when I was studying python-3:
I kept them in case they might come in handy or help someone else out.

How to import urllib.request and urllib.parse:

import urllib.request as urlRequest
import urllib.parse as urlParse

How to make a GET request:

url = "http://www.example.net"
# open the url
x = urlRequest.urlopen(url)
# get the source code
sourceCode = x.read()

How to make a POST request:

url = "https://www.example.com"
values = {"q": "python if"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url, values)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a POST request (403 forbidden responses):

url = "https://www.example.com"
values = {"q": "python urllib"}
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
# encode values for the url
values = urlParse.urlencode(values)
# encode the values in UTF-8 format
values = values.encode("UTF-8")
# create the url
targetUrl = urlRequest.Request(url = url, data = values, headers = headers)
# open the url
x  = urlRequest.urlopen(targetUrl)
# get the source code
sourceCode = x.read()

How to make a GET request (403 forbidden responses):

url = "https://www.example.com"
# pretend to be a chrome 47 browser on a windows 10 machine
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"}
req = urlRequest.Request(url, headers = headers)
# open the url
x = urlRequest.urlopen(req)
# get the source code
sourceCode = x.read()