I know how to use urllib
to download a file. However, it's much faster, if the server allows it, to download several part of the same file simultaneously and then merge them.
How do you do that in Python? If you can't do it easily with the standard lib, any lib that would let you do it?
Although I agree with Gregory's suggestion of using an existing library, it's worth noting that you can do this by using the Range
HTTP header. If the server accepts byte-range requests, you can start several threads to download multiple parts of the file in parallel. This snippet, for example, will only download bytes 0..65535 of the specified file:
import urllib2
url = 'http://example.com/test.zip'
req = urllib2.Request(url, headers={'Range':'bytes=0-65535'})
data = urllib2.urlopen(req).read()
You can determine the remote resource size and see whether the server supports ranged requests by sending a HEAD request:
import urllib2
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
url = 'http://sstatic.net/stackoverflow/img/sprites.png'
req = HeadRequest(url)
response = urllib2.urlopen(req)
response.close()
print respose.headers
The above prints:
Cache-Control: max-age=604800
Content-Length: 16542
Content-Type: image/png
Last-Modified: Thu, 10 Mar 2011 06:13:43 GMT
Accept-Ranges: bytes
ETag: "c434b24eeadecb1:0"
Date: Mon, 14 Mar 2011 16:08:02 GMT
Connection: close
From that we can see that the file size is 16542 bytes ('Content-Length'
) and the server supports byte ranges ('Accept-Ranges: bytes'
).
PycURL can do it. PycURL is a Python interface to libcurl. It can be used to fetch objects identified by a URL from a Python program, similar to the urllib Python module. PycURL is mature, very fast, and supports a lot of features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With