I needed to parse a site, but i got an error 403 Forbidden. Here is a code:
url = 'http://worldagnetwork.com/' result = requests.get(url) print(result.content.decode())
Its output:
<html> <head><title>403 Forbidden</title></head> <body bgcolor="white"> <center><h1>403 Forbidden</h1></center> <hr><center>nginx</center> </body> </html>
Please, say what the problem is.
The easy way to resolve the error is by passing a valid user-agent as a header parameter, as shown below. Alternatively, you can even set a timeout if you are not getting the response from the website. Python will raise a socket exception if the website doesn't respond within the mentioned timeout period.
The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it. This status is similar to 401 , but for the 403 Forbidden status code re-authenticating makes no difference.
It seems the page rejects GET
requests that do not identify a User-Agent
. I visited the page with a browser (Chrome) and copied the User-Agent
header of the GET
request (look in the Network tab of the developer tools):
import requests url = 'http://worldagnetwork.com/' headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'} result = requests.get(url, headers=headers) print(result.content.decode()) # <!doctype html> # <!--[if lt IE 7 ]><html class="no-js ie ie6" lang="en"> <![endif]--> # <!--[if IE 7 ]><html class="no-js ie ie7" lang="en"> <![endif]--> # <!--[if IE 8 ]><html class="no-js ie ie8" lang="en"> <![endif]--> # <!--[if (gte IE 9)|!(IE)]><!--><html class="no-js" lang="en"> <!--<![endif]--> # ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With