Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to download file using urllib3

I would like to download file over HTTP protocol using urllib3. I have managed to do this using following code:

 url = 'http://url_to_a_file'
 connection_pool = urllib3.PoolManager()
 resp = connection_pool.request('GET',url )
 f = open(filename, 'wb')
 f.write(resp.data)
 f.close()
 resp.release_conn()

But I was wondering what is the proper way of doing this. For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.

Note. It is important to me to use urllib3 library not urllib2 for example, because I want my code to be thread safe.

like image 419
running.t Avatar asked Jun 24 '13 21:06

running.t


People also ask

How do I download a file using Python curl?

To download a file with Curl, use the --output or -o command-line option. This option allows you to save the downloaded file to a local drive under the specified name. If you want the uploaded file to be saved under the same name as in the URL, use the --remote-name or -O command line option.

How do we download a file and save it to hard drive using Request module?

You can download files from a URL using the requests module. Simply, get the URL using the get method of requests module and store the result into a variable “myfile” variable. Then you write the contents of the variable into a file.


3 Answers

Your code snippet is close. Two things worth noting:

  1. If you're using resp.data, it will consume the entire response and return the connection (you don't need to resp.release_conn() manually). This is fine if you're cool with holding the data in-memory.

  2. You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn().

This would look something like...

import urllib3 http = urllib3.PoolManager() r = http.request('GET', url, preload_content=False)  with open(path, 'wb') as out:     while True:         data = r.read(chunk_size)         if not data:             break         out.write(data)  r.release_conn() 

The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)

like image 116
shazow Avatar answered Oct 07 '22 02:10

shazow


The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:

url = 'http://url_to_a_file' c = urllib3.PoolManager()  with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:     shutil.copyfileobj(resp, out_file)  resp.release_conn()     # not 100% sure this is required though 
like image 36
Alecz Avatar answered Oct 07 '22 02:10

Alecz


Most easy way with urllib3, you can use shutil do auto-manage packages.

import urllib3
import shutil

http = urllib3.PoolManager()
with open(filename, 'wb') as out:
    r = http.request('GET', url, preload_content=False)
    shutil.copyfileobj(r, out)
like image 39
Gray Avatar answered Oct 07 '22 01:10

Gray