Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download large file in python with requests

Requests is a really nice library. I'd like to use it for downloading big files (>1GB). The problem is it's not possible to keep whole file in memory; I need to read it in chunks. And this is a problem with the following code:

import requests  def DownloadFile(url)     local_filename = url.split('/')[-1]     r = requests.get(url)     f = open(local_filename, 'wb')     for chunk in r.iter_content(chunk_size=512 * 1024):          if chunk: # filter out keep-alive new chunks             f.write(chunk)     f.close()     return  

For some reason it doesn't work this way: it still loads the response into memory before it is saved to a file.

UPDATE

If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.

like image 725
Roman Podlinov Avatar asked May 22 '13 14:05

Roman Podlinov


2 Answers

It's much easier if you use Response.raw and shutil.copyfileobj():

import requests import shutil  def download_file(url):     local_filename = url.split('/')[-1]     with requests.get(url, stream=True) as r:         with open(local_filename, 'wb') as f:             shutil.copyfileobj(r.raw, f)      return local_filename 

This streams the file to disk without using excessive memory, and the code is simple.

Note: According to the documentation, Response.raw will not decode gzip and deflate transfer-encodings, so you will need to do this manually.

like image 26
John Zwinck Avatar answered Sep 30 '22 01:09

John Zwinck


With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):     local_filename = url.split('/')[-1]     # NOTE the stream=True parameter below     with requests.get(url, stream=True) as r:         r.raise_for_status()         with open(local_filename, 'wb') as f:             for chunk in r.iter_content(chunk_size=8192):                  # If you have chunk encoded response uncomment if                 # and set chunk_size parameter to None.                 #if chunk:                  f.write(chunk)     return local_filename 

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it's expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See body-content-workflow and Response.iter_content for further reference.

like image 187
Roman Podlinov Avatar answered Sep 30 '22 01:09

Roman Podlinov