I would like to download file over <code>HTTP</code> protocol using <code>urllib3</code>. I have managed to do this using following code: <pre class="prettyprint"><code> url = 'http://url_to_a_file' connection_pool = urllib3.PoolManager() resp = connection_pool.request('GET',url ) f = open(filename, 'wb') f.write(resp.data) f.close() resp.release_conn() </code></pre> But I was wondering what is the proper way of doing this. For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable. Note. It is important to me to use <code>urllib3</code> library not <code>urllib2</code> for example, because I want my code to be thread safe.

Your code snippet is close. Two things worth noting: <ol> <li>If you're using <code>resp.data</code>, it will consume the entire response and return the connection (you don't need to <code>resp.release_conn()</code> manually). This is fine if you're cool with holding the data in-memory.</li> <li>You could use <code>resp.read(amt)</code> which will stream the response, but the connection will need to be returned via <code>resp.release_conn()</code>.</li> </ol> This would look something like... <pre class="prettyprint"><code>import urllib3 http = urllib3.PoolManager() r = http.request('GET', url, preload_content=False) with open(path, 'wb') as out: while True: data = r.read(chunk_size) if not data: break out.write(data) r.release_conn() </code></pre> The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)

The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below: <pre class="prettyprint"><code>url = 'http://url_to_a_file' c = urllib3.PoolManager() with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file: shutil.copyfileobj(resp, out_file) resp.release_conn() # not 100% sure this is required though </code></pre>

Most easy way with urllib3, you can use shutil do auto-manage packages. <pre class="prettyprint lang-py prettyprint-override"><code>import urllib3 import shutil http = urllib3.PoolManager() with open(filename, 'wb') as out: r = http.request('GET', url, preload_content=False) shutil.copyfileobj(r, out) </code></pre>

What's the best way to download file using urllib3

Tags:

python

download

urllib3

I would like to download file over HTTP protocol using urllib3. I have managed to do this using following code:

 url = 'http://url_to_a_file'
 connection_pool = urllib3.PoolManager()
 resp = connection_pool.request('GET',url )
 f = open(filename, 'wb')
 f.write(resp.data)
 f.close()
 resp.release_conn()

But I was wondering what is the proper way of doing this. For example will it work well for big files and If no what to do to make this code more bug tolerant and scalable.

Note. It is important to me to use urllib3 library not urllib2 for example, because I want my code to be thread safe.

419

asked Jun 24 '13 21:06

running.t

3 Answers

Your code snippet is close. Two things worth noting:

If you're using resp.data, it will consume the entire response and return the connection (you don't need to resp.release_conn() manually). This is fine if you're cool with holding the data in-memory.
You could use resp.read(amt) which will stream the response, but the connection will need to be returned via resp.release_conn().

This would look something like...

import urllib3 http = urllib3.PoolManager() r = http.request('GET', url, preload_content=False)  with open(path, 'wb') as out:     while True:         data = r.read(chunk_size)         if not data:             break         out.write(data)  r.release_conn()

The documentation might be a bit lacking on this scenario. If anyone is interested in making a pull-request to improve the urllib3 documentation, that would be greatly appreciated. :)

116

answered Oct 07 '22 02:10

shazow

The most correct way to do this is probably to get a file-like object that represents the HTTP response and copy it to a real file using shutil.copyfileobj as below:

url = 'http://url_to_a_file' c = urllib3.PoolManager()  with c.request('GET',url, preload_content=False) as resp, open(filename, 'wb') as out_file:     shutil.copyfileobj(resp, out_file)  resp.release_conn()     # not 100% sure this is required though

answered Oct 07 '22 02:10

Alecz

Most easy way with urllib3, you can use shutil do auto-manage packages.

import urllib3
import shutil

http = urllib3.PoolManager()
with open(filename, 'wb') as out:
    r = http.request('GET', url, preload_content=False)
    shutil.copyfileobj(r, out)

answered Oct 07 '22 01:10

Gray

Related questions
                            
                                How do I define a SQLAlchemy relation representing the latest object in a collection?
                            
                                How can I run 2 servers at once in Python?
                            
                                Accessing samba shares with gio in python
                            
                                django combine models.DecimalField with forms -> error: quantize result has too many digits for current context
                            
                                Can you separate python projects logically into separate files/classes like in C#/Java?
                            
                                Error setting up Mercurial on Windows Server 2008
                            
                                Python to C# Code Explanation
                            
                                Implementation of an async method in Python DBus
                            
                                python-tz am I wrong or it's a bug
                            
                                Suggested GA operators for a TSP problem?
                            
                                How to Handle EOFError for raw_input() in python in Mac OS X
                            
                                Django Piston: How can I exclude nested fields from handler results? Is it even possible?
                            
                                Are Python global variables thread-safe?
                            
                                How do I get started with zc.buildout and Distribute?
                            
                                Find maximum signed short integer in python
                            
                                Feedparser-basics how to
                            
                                Cross platform hidden file detection
                            
                                Make Read the Docs include autodoc documentation for special-members?
                            
                                Optimal tab size for code readability [closed]
                            
                                Does TensorFlow view all CPUs of one machine as ONE device?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With