Background: I am using urllib.urlretrieve
, as opposed to any other function in the urllib*
modules, because of the hook function support (see reporthook
below) .. which is used to display a textual progress bar. This is Python >=2.6.
>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]])
However, urlretrieve
is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).
>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar') >>> h.items() [('date', 'Thu, 20 Aug 2009 20:07:40 GMT'), ('expires', '-1'), ('content-type', 'text/html; charset=ISO-8859-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0')] >>> h.status '' >>>
What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?
Just catch HTTPError , handle it, and if it's not Error 404, simply use raise to re-raise the exception. See the Python tutorial. can i do urllib2. urlopen("*") to handle any 404 errors and route them to my 404.
In line 14, the urllib. request. urlretrieve() function is used to retrieve the image from the given url and store it to the required file directory.
Check out urllib.urlretrieve
's complete code:
def urlretrieve(url, filename=None, reporthook=None, data=None): global _urlopener if not _urlopener: _urlopener = FancyURLopener() return _urlopener.retrieve(url, filename, reporthook, data)
In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default
to detect 404s:
class MyURLopener(urllib.FancyURLopener): def http_error_default(self, url, fp, errcode, errmsg, headers): # handle errors the way you'd like to fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With