Problem HTTP error 403 in Python 3 Web Scraping

People also ask

How do I fix http error 403 in Python?

The easy way to resolve the error is by passing a valid user-agent as a header parameter, as shown below. Alternatively, you can even set a timeout if you are not getting the response from the website. Python will raise a socket exception if the website doesn't respond within the mentioned timeout period.

What is the meaning of HTTP status code 403?

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.

This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it's easily detected). Try setting a known browser user agent with:

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

This works for me.

By the way, in your code you are missing the () after .read in the urlopen line, but I think that it's a typo.

TIP: since this is exercise, choose a different, non restrictive site. Maybe they are blocking urllib for some reason...

Definitely it's blocking because of your use of urllib based on the user agent. This same thing is happening to me with OfferUp. You can create a new class called AppURLopener which overrides the user-agent with Mozilla.

import urllib.request

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

opener = AppURLopener()
response = opener.open('http://httpbin.org/user-agent')

Source

"This is probably because of mod_security or some similar server security feature which blocks known

spider/bot

user agents (urllib uses something like python urllib/3.3.0, it's easily detected)" - as already mentioned by Stefano Sanfilippo

from urllib.request import Request, urlopen
url="https://stackoverflow.com/search?q=html+error+403"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

The web_byte is a byte object returned by the server and the content type present in webpage is mostly utf-8. Therefore you need to decode web_byte using decode method.

This solves complete problem while I was having trying to scrape from a website using PyCharm

P.S -> I use python 3.4

Based on previous answers this has worked for me with Python 3.7 by increasing the timeout to 10.

from urllib.request import Request, urlopen

req = Request('Url_Link', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Since the page works in browser and not when calling within python program, it seems that the web app that serves that url recognizes that you request the content not by the browser.

Demonstration:

curl --dump-header r.txt http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1

...
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access ...
</HTML>

and the content in r.txt has status line:

HTTP/1.1 403 Forbidden

Try posting header 'User-Agent' which fakes web client.

NOTE: The page contains Ajax call that creates the table you probably want to parse. You'll need to check the javascript logic of the page or simply using browser debugger (like Firebug / Net tab) to see which url you need to call to get the table's content.

Related questions
                            
                                How to read pickle file?
                            
                                Wheel file installation
                            
                                NumPy: function for simultaneous max() and min()
                            
                                Django - how to create a file and save it to a model's FileField?
                            
                                Get Image size WITHOUT loading image into memory
                            
                                How do I fix PyDev "Undefined variable from import" errors?
                            
                                Fast check for NaN in NumPy
                            
                                class method generates "TypeError: ... got multiple values for keyword argument ..."
                            
                                How do you run your own code alongside Tkinter's event loop?
                            
                                Pythonic way of checking if a condition holds for any element of a list
                            
                                Why can't Python find shared objects that are in directories in sys.path?
                            
                                How to convert string to binary?
                            
                                Saving images in Python at a very high quality
                            
                                Check if Python Package is installed
                            
                                Chained method calls indentation style in Python [duplicate]
                            
                                You are trying to add a non-nullable field 'new_field' to userprofile without a default
                            
                                Making an API call in Python with an API that requires a bearer token
                            
                                ImproperlyConfiguredError about app_name when using namespace in include()
                            
                                Compare two columns using pandas
                            
                                How to get instance variables in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Problem HTTP error 403 in Python 3 Web Scraping

Tags:

python

http

http-status-code-403

web-scraping

People also ask

Recent Activity

Donate For Us