Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream large binary files with urllib2 to file

People also ask

What is the difference between Urllib and urllib2?

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL. 2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.

How do I use urllib2?

Simple urllib2 scripturlopen('http://python.org/') print "Response:", response # Get the URL. This gets the real URL. print "The URL is: ", response. geturl() # Getting the code print "This gets the code: ", response.

Does urllib2 work in Python 3?

NOTE: urllib2 is no longer available in Python 3 You can get more idea about urllib.


No reason to work line by line (small chunks AND requires Python to find the line ends for you!-), just chunk it up in bigger chunks, e.g.:

# from urllib2 import urlopen # Python 2
from urllib.request import urlopen # Python 3

response = urlopen(url)
CHUNK = 16 * 1024
with open(file, 'wb') as f:
    while True:
        chunk = response.read(CHUNK)
        if not chunk:
            break
        f.write(chunk)

Experiment a bit with various CHUNK sizes to find the "sweet spot" for your requirements.


You can also use shutil:

import shutil
try:
    from urllib.request import urlopen # Python 3
except ImportError:
    from urllib2 import urlopen # Python 2

def get_large_file(url, file, length=16*1024):
    req = urlopen(url)
    with open(file, 'wb') as fp:
        shutil.copyfileobj(req, fp, length)

I used to use mechanize module and its Browser.retrieve() method. In the past it took 100% CPU and downloaded things very slowly, but some recent release fixed this bug and works very quickly.

Example:

import mechanize
browser = mechanize.Browser()
browser.retrieve('http://www.kernel.org/pub/linux/kernel/v2.6/testing/linux-2.6.32-rc1.tar.bz2', 'Downloads/my-new-kernel.tar.bz2')

Mechanize is based on urllib2, so urllib2 can also have similar method... but I can't find any now.


You can use urllib.retrieve() to download files:

Example:

try:
    from urllib import urlretrieve # Python 2

except ImportError:
    from urllib.request import urlretrieve # Python 3

url = "http://www.examplesite.com/myfile"
urlretrieve(url,"./local_file")