urlopen error 10045, 'address already in use' while downloading in Python 2.5 on Windows

Question

I'm writing code that will run on Linux, OS X, and Windows. It downloads a list of approximately 55,000 files from the server, then steps through the list of files, checking if the files are present locally. (With SHA hash verification and a few other goodies.) If the files aren't present locally or the hash doesn't match, it downloads them.

The server-side is plain-vanilla Apache 2 on Ubuntu over port 80.

The client side works perfectly on Mac and Linux, but gives me this error on Windows (XP and Vista) after downloading a number of files:

urllib2.URLError: <urlopen error <10048, 'Address already in use'>>

This link: http://bytes.com/topic/python/answers/530949-client-side-tcp-socket-receiving-address-already-use-upon-connect points me to TCP port exhaustion, but "netstat -n" never showed me more than six connections in "TIME_WAIT" status, even just before it errored out.

The code (called once for each of the 55,000 files it downloads) is this:

request = urllib2.Request(file_remote_path)
opener = urllib2.build_opener()
datastream = opener.open(request)
outfileobj = open(temp_file_path, 'wb')
try:
    while True:
        chunk = datastream.read(CHUNK_SIZE)
        if chunk == '':
            break
        else:
            outfileobj.write(chunk)
finally:
    outfileobj = outfileobj.close()
    datastream.close()

UPDATE: I find by greping the log that it enters the download routine exactly 3998 times. I've run this multiple times and it fails at 3998 each time. Given that the linked article states that available ports are 5000-1025=3975 (and some are probably expiring and being reused) it's starting to look a lot more like the linked article describes the real issue. However, I'm still not sure how to fix this. Making registry edits is not an option.

Blauohr · Accepted Answer

If it is really a resource problem (freeing os socket resources)

try this:

request = urllib2.Request(file_remote_path)
opener = urllib2.build_opener()

retry = 3 # 3 tries
while retry :
    try :
        datastream = opener.open(request)
    except urllib2.URLError, ue:
        if ue.reason.find('10048') > -1 :
            if retry :
                retry -= 1
            else :
                raise urllib2.URLError("Address already in use / retries exhausted")
        else :
            retry = 0
    if datastream :
        retry = 0

outfileobj = open(temp_file_path, 'wb')
try:
    while True:
        chunk = datastream.read(CHUNK_SIZE)
        if chunk == '':
            break
        else:
            outfileobj.write(chunk)
finally:
    outfileobj = outfileobj.close()
    datastream.close()

if you want you can insert a sleep or you make it os depended

on my win-xp the problem doesn't show up (I reached 5000 downloads)

I watch my processes and network with process hacker.

Jim Garrison · Answer

Thinking outside the box, the problem you seem to be trying to solve has already been solved by a program called rsync. You might look for a Windows implementation and see if it meets your needs.

Jonathan Feinberg · Answer

You should seriously consider copying and modifying this pyCurl example for efficient downloading of a large collection of files.

cmeerw · Answer

Instead of opening a new TCP connection for each request you should really use persistent HTTP connections - have a look at urlgrabber (or alternatively, just at keepalive.py for how to add keep-alive connection support to urllib2).

urlopen error 10045, 'address already in use' while downloading in Python 2.5 on Windows

Tags:

python

http

windows

download

urllib2

Schof

4 Answers

Blauohr

Jim Garrison

Jonathan Feinberg

cmeerw

Recent Activity

Donate For Us

urlopen error 10045, 'address already in use' while downloading in Python 2.5 on Windows

Tags:

python

http

windows

download

urllib2

Schof

4 Answers

Blauohr

Jim Garrison

Jonathan Feinberg

cmeerw

Related questions

Recent Activity

Donate For Us