Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to know if urllib.urlretrieve succeeds?

urllib.urlretrieve returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example:

urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg') 

just returns silently, even if abc.jpg doesn't exist on google.com server, the generated abc.jpg is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I can't find any doc for httplib.HTTPMessage.

Can anybody provide some information about this problem?

like image 405
btw0 Avatar asked Jun 12 '09 17:06

btw0


People also ask

What does Urllib Urlopen return?

The problem here is that urlopen returns a reference to a file object from which you should retrieve HTML. Please note that urllib. urlopen function is marked as deprecated since python 2.6. It's recommended to use urllib2.

What does Urllib request Urlretrieve do?

The urlretrieve() function provided by the urllib module. The urlretrieve () method downloads the remote data directly to the local. The parameter filename specifies the save local path (if the parameter is not specified, urllib will generate a temporary file to save the data.)


1 Answers

Consider using urllib2 if it possible in your case. It is more advanced and easy to use than urllib.

You can detect any HTTP errors easily:

>>> import urllib2 >>> resp = urllib2.urlopen("http://google.com/abc.jpg") Traceback (most recent call last): <<MANY LINES SKIPPED>> urllib2.HTTPError: HTTP Error 404: Not Found 

resp is actually HTTPResponse object that you can do a lot of useful things with:

>>> resp = urllib2.urlopen("http://google.com/") >>> resp.code 200 >>> resp.headers["content-type"] 'text/html; charset=windows-1251' >>> resp.read() "<<ACTUAL HTML>>" 
like image 99
Alexander Lebedev Avatar answered Sep 24 '22 04:09

Alexander Lebedev