HTTP Error 403: Forbidden with urlretrieve

Tags:

I am trying to download a PDF, however I get the following error: HTTP Error 403: Forbidden

I am aware that the server is blocking for whatever reason, but I cant seem to find a solution.

import urllib.request
import urllib.parse
import requests


def download_pdf(url):

full_name = "Test.pdf"
urllib.request.urlretrieve(url, full_name)


try: 
url =         ('http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf')

print('initialized')

hdr = {}
hdr = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2)     AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36',
'Content-Length': '136963',
}



print('HDR recieved')

req = urllib.request.Request(url, headers=hdr)

print('Header sent')

resp = urllib.request.urlopen(req)

print('Request sent')

respData = resp.read()

download_pdf(url)


print('Complete')

except Exception as e:
print(str(e))

583

asked Jan 22 '16 23:01

Z.Chen

1 Answers

You seem to have already realised this; the remote server is apparently checking the user agent header and rejecting requests from Python's urllib. But urllib.request.urlretrieve() doesn't allow you to change the HTTP headers, however, you can use urllib.request.URLopener.retrieve():

import urllib.request

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'whatever')
filename, headers = opener.retrieve(url, 'Test.pdf')

N.B. You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener has been deprecated. For that reason you should not use them in new code.

The above aside, you are going to a lot of trouble to simply access a URL. Your code imports requests, but you don't use it - you should though because it is much easier than urllib. This works for me:

import requests

url = 'http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf'
r = requests.get(url)
with open('0580_s03_qp_1.pdf', 'wb') as outfile:
    outfile.write(r.content)

answered Oct 22 '22 07:10

mhawke

Related questions
                            
                                How to fix a : TypeError 'tuple' object does not support item assignment
                            
                                Python PIL NameError global name Image is not defined
                            
                                What's the fastest way to locate a list element within a list in python?
                            
                                python numpy weighted average with nans
                            
                                How to count number of records in an SQL database with python
                            
                                Python S3 download zip file
                            
                                PyCharm Python project No such file or directory
                            
                                List comprehension with condition
                            
                                Understanding thread.join(timeout)
                            
                                Selenium test with Python in Internet Explorer
                            
                                Python's argparse: How to use keyword as argument's name
                            
                                Django : Convert UTC to local time zone in 'Views'
                            
                                pandas column names to list
                            
                                Python default logger disabled
                            
                                django rest framework - using detail_route and detail_list
                            
                                How to store money in elasticsearch
                            
                                Python exception for HTTP response codes
                            
                                polynomial regression using python
                            
                                Sparse Efficiency Warning while changing the column
                            
                                How (if it is possible) can I get the version of Django REST framework?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

HTTP Error 403: Forbidden with urlretrieve

Tags:

python

http

urllib

python-requests

Z.Chen

People also ask

1 Answers

mhawke

Recent Activity

Donate For Us