Download and save PDF file with Python requests module

Tags:

I am trying to download a PDF file from a website and save it to disk. My attempts either fail with encoding errors or result in blank PDFs.

In [1]: import requests  In [2]: url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf'  In [3]: response = requests.get(url)  In [4]: with open('/tmp/metadata.pdf', 'wb') as f:    ...:     f.write(response.text) --------------------------------------------------------------------------- UnicodeEncodeError                        Traceback (most recent call last) <ipython-input-4-4be915a4f032> in <module>()       1 with open('/tmp/metadata.pdf', 'wb') as f: ----> 2     f.write(response.text)       3   UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-14: ordinal not in range(128)  In [5]: import codecs  In [6]: with codecs.open('/tmp/metadata.pdf', 'wb', encoding='utf8') as f:    ...:     f.write(response.text)    ...:

I know it is a codec problem of some kind but I can't seem to get it to work.

294

asked Dec 29 '15 02:12

Jim

1 Answers

You should use response.content in this case:

with open('/tmp/metadata.pdf', 'wb') as f:     f.write(response.content)

From the document:

You can also access the response body as bytes, for non-text requests:
>>> r.content b'[{"repository":{"open_issues":0,"url":"https://github.com/... 

So that means: response.text return the output as a string object, use it when you're downloading a text file. Such as HTML file, etc.

And response.content return the output as bytes object, use it when you're downloading a binary file. Such as PDF file, audio file, image, etc.

You can also use response.raw instead. However, use it when the file which you're about to download is large. Below is a basic example which you can also find in the document:

import requests  url = 'http://www.hrecos.org//images/Data/forweb/HRTVBSH.Metadata.pdf' r = requests.get(url, stream=True)  with open('/tmp/metadata.pdf', 'wb') as fd:     for chunk in r.iter_content(chunk_size):         fd.write(chunk)

chunk_size is the chunk size which you want to use. If you set it as 2000, then requests will download that file the first 2000 bytes, write them into the file, and do this again, again and again, unless it finished.

So this can save your RAM. But I'd prefer use response.content instead in this case since your file is small. As you can see use response.raw is complex.

Relates:

How to download large file in python with requests.py?
How to download image using requests

131

answered Sep 20 '22 15:09

Remi Crystal

Related questions
                            
                                Performance of Pandas apply vs np.vectorize to create new column from existing columns
                            
                                Only add to a dict if a condition is met
                            
                                How to sort a Pandas DataFrame by index?
                            
                                Is there a clever way to pass the key to defaultdict's default_factory?
                            
                                How to convert defaultdict to dict?
                            
                                When do I need to use sqlalchemy back_populates?
                            
                                How do I iterate through the alphabet?
                            
                                Negative list index? [duplicate]
                            
                                Display string multiple times
                            
                                Finding last occurrence of substring in string, replacing that
                            
                                Editing specific line in text file in Python
                            
                                warnings.warn() vs. logging.warning()
                            
                                How can I do DNS lookups in Python, including referring to /etc/hosts?
                            
                                How to allow list append() method to return the new list
                            
                                Iterate over all pairs of consecutive items in a list [duplicate]
                            
                                Disable auto wrap long line in Visual Studio Code
                            
                                Getting indices of True values in a boolean list
                            
                                Installing Numpy on 64bit Windows 7 with Python 2.7.3 [closed]
                            
                                VSCode: There is no Pip installer available in the selected environment
                            
                                Input and output numpy arrays to h5py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Download and save PDF file with Python requests module

Tags:

python

python-requests

python-2.7

Jim

People also ask

1 Answers

Remi Crystal

Recent Activity

Donate For Us