Requests is a really nice library. I'd like to use it for downloading big files (>1GB). The problem is it's not possible to keep whole file in memory; I need to read it in chunks. And this is a problem with the following code:
import requests def DownloadFile(url) local_filename = url.split('/')[-1] r = requests.get(url) f = open(local_filename, 'wb') for chunk in r.iter_content(chunk_size=512 * 1024): if chunk: # filter out keep-alive new chunks f.write(chunk) f.close() return
For some reason it doesn't work this way: it still loads the response into memory before it is saved to a file.
UPDATE
If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.
It's much easier if you use Response.raw
and shutil.copyfileobj()
:
import requests import shutil def download_file(url): local_filename = url.split('/')[-1] with requests.get(url, stream=True) as r: with open(local_filename, 'wb') as f: shutil.copyfileobj(r.raw, f) return local_filename
This streams the file to disk without using excessive memory, and the code is simple.
Note: According to the documentation, Response.raw
will not decode gzip
and deflate
transfer-encodings, so you will need to do this manually.
With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:
def download_file(url): local_filename = url.split('/')[-1] # NOTE the stream=True parameter below with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): # If you have chunk encoded response uncomment if # and set chunk_size parameter to None. #if chunk: f.write(chunk) return local_filename
Note that the number of bytes returned using iter_content
is not exactly the chunk_size
; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.
See body-content-workflow and Response.iter_content for further reference.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With