Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reading a stream made by urllib2 never recovers when connection got interrupted

Tags:

python

urllib2

While trying to make one of my python applications a bit more robust in case of connection interruptions I discovered that calling the read function of an http-stream made by urllib2 may block the script forever.

I thought that the read function will timeout and eventually raise an exception but this does not seam to be the case when the connection got interrupted during a read function call.

Here is the code that will cause the problem:

import urllib2

while True:
    try:
        stream = urllib2.urlopen('http://www.google.de/images/nav_logo4.png')
        while stream.read(): pass
        print "Done"
    except:
        print "Error"

(If you try out the script you probably need to interrupt the connection several times before you will reach the state from which the script never recovers)

I watched the script via Winpdb and made a screenshot of the state from which the script does never recover (even if the network got available again).

Winpdb http://img10.imageshack.us/img10/6716/urllib2.jpg

Is there a way to create a python script that will continue to work reliable even if the network connection got interrupted? (I would prefer to avoid doing this inside an extra thread.)

like image 836
Martin Avatar asked May 01 '09 13:05

Martin


2 Answers

Try something like:

import socket
socket.setdefaulttimeout(5.0)
   ...
try:
   ...
except socket.timeout:
   (it timed out, retry)
like image 166
Alex Martelli Avatar answered Nov 03 '22 06:11

Alex Martelli


Good question, I would be really interested in finding an answer. The only workaround I could think of is using the signal trick explained in python docs. In your case it will be more like:

import signal
import urllib2

def read(url):
    stream = urllib2.urlopen(url)
    return stream.read()

def handler(signum, frame):
    raise IOError("The page is taking too long to read")

# Set the signal handler and a 5-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)

# This read() may hang indefinitely
try:
    output = read('http://www.google.de/images/nav_logo4.png')
except IOError:
    # try to read again or print an error
    pass

signal.alarm(0)          # Disable the alarm
like image 36
Nadia Alramli Avatar answered Nov 03 '22 06:11

Nadia Alramli