Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to catch 404 error in urllib.urlretrieve

Background: I am using urllib.urlretrieve, as opposed to any other function in the urllib* modules, because of the hook function support (see reporthook below) .. which is used to display a textual progress bar. This is Python >=2.6.

>>> urllib.urlretrieve(url[, filename[, reporthook[, data]]]) 

However, urlretrieve is so dumb that it leaves no way to detect the status of the HTTP request (eg: was it 404 or 200?).

>>> fn, h = urllib.urlretrieve('http://google.com/foo/bar') >>> h.items()  [('date', 'Thu, 20 Aug 2009 20:07:40 GMT'),  ('expires', '-1'),  ('content-type', 'text/html; charset=ISO-8859-1'),  ('server', 'gws'),  ('cache-control', 'private, max-age=0')] >>> h.status '' >>> 

What is the best known way to download a remote HTTP file with hook-like support (to show progress bar) and a decent HTTP error handling?

like image 874
Sridhar Ratnakumar Avatar asked Aug 20 '09 20:08

Sridhar Ratnakumar

People also ask

How does Python handle 404 error?

Just catch HTTPError , handle it, and if it's not Error 404, simply use raise to re-raise the exception. See the Python tutorial. can i do urllib2. urlopen("*") to handle any 404 errors and route them to my 404.

What does Urllib request Urlretrieve do?

In line 14, the urllib. request. urlretrieve() function is used to retrieve the image from the given url and store it to the required file directory.

1 Answers

Check out urllib.urlretrieve's complete code:

def urlretrieve(url, filename=None, reporthook=None, data=None):   global _urlopener   if not _urlopener:     _urlopener = FancyURLopener()   return _urlopener.retrieve(url, filename, reporthook, data) 

In other words, you can use urllib.FancyURLopener (it's part of the public urllib API). You can override http_error_default to detect 404s:

class MyURLopener(urllib.FancyURLopener):   def http_error_default(self, url, fp, errcode, errmsg, headers):     # handle errors the way you'd like to  fn, h = MyURLopener().retrieve(url, reporthook=my_report_hook) 
like image 169
orip Avatar answered Sep 21 '22 08:09
