I'm downloading a huge set of files with following code in a loop:
try:
urllib.urlretrieve(url2download, destination_on_local_filesystem)
except KeyboardInterrupt:
break
except:
print "Timed-out or got some other exception: "+url2download
If the server times-out on URL url2download when connection is just initiating, the last exception is handled properly. But sometimes server responded, and downloading is started, but the server is so slow, that it'll takes hours for even one file, and eventually it returns something like:
Enter username for Clients Only at albrightandomalley.com:
Enter password for in Clients Only at albrightandomalley.com:
and just hangs there (although no username/passworde is aksed if the same link is downloaded through the browser).
My intention in this situation would be -- skip this file and go to the next one. The question is -- how to do that? Is there a way in python to specify how long is OK to work on downloading one file, and if more time is already spent, interrupt, and go forward?
Timeouts in Python requests You can tell requests library to stop waiting for a response after a given amount of time by passing a number to the timeout parameter. If the requests library does not receive response in x seconds, it will raise a Timeout error.
The first thing to do is to use HTTP/2.0 and keep one conection open for all the files with Keep-Alive. The easiest way to do that is to use the Requests library, and use a session. If this isn't fast enough, then you need to do several parallel downloads with either multiprocessing or threads.
Try:
import socket
socket.setdefaulttimeout(30)
If you're not limited to what's shipped with python out of the box, then the urlgrabber module might come in handy:
import urlgrabber
urlgrabber.urlgrab(url2download, destination_on_local_filesystem,
timeout=30.0)
There's a discussion of this here. Caveats (in addition to the ones they mention): I haven't tried it, and they're using urllib2
, not urllib
(would that be a problem for you?) (Actually, now that I think about it, this technique would probably work for urllib
, too).
This question is more general about timing out a function: How to limit execution time of a function call in Python
I've used the method described in my answer there to write a wait for text function that times out to attempt an auto-login. If you'd like similar functionality you can reference the code here:
http://code.google.com/p/psftplib/source/browse/trunk/psftplib.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With