Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to time-out gracefully while downloading with python

I'm downloading a huge set of files with following code in a loop:

try:
    urllib.urlretrieve(url2download, destination_on_local_filesystem)
except KeyboardInterrupt:
    break
except:
    print "Timed-out or got some other exception: "+url2download

If the server times-out on URL url2download when connection is just initiating, the last exception is handled properly. But sometimes server responded, and downloading is started, but the server is so slow, that it'll takes hours for even one file, and eventually it returns something like:

Enter username for Clients Only at albrightandomalley.com:
Enter password for  in Clients Only at albrightandomalley.com:

and just hangs there (although no username/passworde is aksed if the same link is downloaded through the browser).

My intention in this situation would be -- skip this file and go to the next one. The question is -- how to do that? Is there a way in python to specify how long is OK to work on downloading one file, and if more time is already spent, interrupt, and go forward?

like image 215
user63503 Avatar asked Mar 01 '09 23:03

user63503


People also ask

What is timeout error in Python?

Timeouts in Python requests You can tell requests library to stop waiting for a response after a given amount of time by passing a number to the timeout parameter. If the requests library does not receive response in x seconds, it will raise a Timeout error.

How do I make Python download faster?

The first thing to do is to use HTTP/2.0 and keep one conection open for all the files with Keep-Alive. The easiest way to do that is to use the Requests library, and use a session. If this isn't fast enough, then you need to do several parallel downloads with either multiprocessing or threads.


4 Answers

Try:

import socket

socket.setdefaulttimeout(30)

like image 124
spiralmoon Avatar answered Nov 10 '22 01:11

spiralmoon


If you're not limited to what's shipped with python out of the box, then the urlgrabber module might come in handy:

import urlgrabber
urlgrabber.urlgrab(url2download, destination_on_local_filesystem,
                   timeout=30.0)
like image 34
Сыч Avatar answered Nov 10 '22 00:11

Сыч


There's a discussion of this here. Caveats (in addition to the ones they mention): I haven't tried it, and they're using urllib2, not urllib (would that be a problem for you?) (Actually, now that I think about it, this technique would probably work for urllib, too).

like image 23
Jacob Gabrielson Avatar answered Nov 10 '22 02:11

Jacob Gabrielson


This question is more general about timing out a function: How to limit execution time of a function call in Python

I've used the method described in my answer there to write a wait for text function that times out to attempt an auto-login. If you'd like similar functionality you can reference the code here:

http://code.google.com/p/psftplib/source/browse/trunk/psftplib.py

like image 40
monkut Avatar answered Nov 10 '22 01:11

monkut