<code>urllib.urlretrieve</code> returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example: <pre class="prettyprint"><code>urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg') </code></pre> just returns silently, even if abc.jpg doesn't exist on google.com server, the generated <code>abc.jpg</code> is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I can't find any doc for <code>httplib.HTTPMessage</code>. Can anybody provide some information about this problem?

Consider using <code>urllib2</code> if it possible in your case. It is more advanced and easy to use than <code>urllib</code>. You can detect any HTTP errors easily: <pre class="prettyprint"><code>>>> import urllib2 >>> resp = urllib2.urlopen("http://google.com/abc.jpg") Traceback (most recent call last): <<MANY LINES SKIPPED>> urllib2.HTTPError: HTTP Error 404: Not Found </code></pre> <code>resp</code> is actually <code>HTTPResponse</code> object that you can do a lot of useful things with: <pre class="prettyprint"><code>>>> resp = urllib2.urlopen("http://google.com/") >>> resp.code 200 >>> resp.headers["content-type"] 'text/html; charset=windows-1251' >>> resp.read() "<<ACTUAL HTML>>" </code></pre>

How to know if urllib.urlretrieve succeeds?

Tags:

python

networking

urllib

urllib.urlretrieve returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example:

urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')

just returns silently, even if abc.jpg doesn't exist on google.com server, the generated abc.jpg is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I can't find any doc for httplib.HTTPMessage.

Can anybody provide some information about this problem?

405

asked Jun 12 '09 17:06

btw0

1 Answers

Consider using urllib2 if it possible in your case. It is more advanced and easy to use than urllib.

You can detect any HTTP errors easily:

>>> import urllib2 >>> resp = urllib2.urlopen("http://google.com/abc.jpg") Traceback (most recent call last): <<MANY LINES SKIPPED>> urllib2.HTTPError: HTTP Error 404: Not Found

resp is actually HTTPResponse object that you can do a lot of useful things with:

>>> resp = urllib2.urlopen("http://google.com/") >>> resp.code 200 >>> resp.headers["content-type"] 'text/html; charset=windows-1251' >>> resp.read() "<<ACTUAL HTML>>"

answered Sep 24 '22 04:09

Alexander Lebedev

Related questions
                            
                                Get the second largest number in a list in linear time
                            
                                How to write unicode strings into a file? [duplicate]
                            
                                Display fullscreen mode on Tkinter
                            
                                Inserting an item in a Tuple [duplicate]
                            
                                Python (pip) - RequestsDependencyWarning: urllib3 (1.9.1) or chardet (2.3.0) doesn't match a supported version
                            
                                xls to csv converter
                            
                                How to limit a number to be within a specified range? (Python)
                            
                                Clear all widgets in a layout in pyqt
                            
                                Generate unique id in django from a model field
                            
                                ImportError: No module named mysql.connector using Python2
                            
                                Enforcing python version in setup.py
                            
                                Efficient calculation of Fibonacci series
                            
                                Nose unable to find tests in ubuntu
                            
                                Running a test suite with over a million test cases
                            
                                Error packaging Kivy with numpy library for Android using buildozer
                            
                                gensim Doc2Vec vs tensorflow Doc2Vec
                            
                                Best practices for turning jupyter notebooks into python scripts
                            
                                Why is string's startswith slower than in?
                            
                                When does socket.recv(recv_size) return?
                            
                                ResourceWarning unclosed socket in Python 3 Unit Test

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With