I'm using urllib2 to load files from ftp- and http-servers.
Some of the servers support only one connection per IP. The problem is, that urllib2 does not close the connection instantly. Look at the example-program.
from urllib2 import urlopen
from time import sleep
url = 'ftp://user:pass@host/big_file.ext'
def load_file(url):
f = urlopen(url)
loaded = 0
while True:
data = f.read(1024)
if data == '':
break
loaded += len(data)
f.close()
#sleep(1)
print('loaded {0}'.format(loaded))
load_file(url)
load_file(url)
The code loads two files (here the two files are the same) from an ftp-server which supports only 1 connection. This will print the following log:
loaded 463675266
Traceback (most recent call last):
File "conection_test.py", line 20, in <module>
load_file(url)
File "conection_test.py", line 7, in load_file
f = urlopen(url)
File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 391, in open
response = self._open(req, data)
File "/usr/lib/python2.6/urllib2.py", line 409, in _open
'_open', req)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 1331, in ftp_open
fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
File "/usr/lib/python2.6/urllib2.py", line 1352, in connect_ftp
fw = ftpwrapper(user, passwd, host, port, dirs, timeout)
File "/usr/lib/python2.6/urllib.py", line 854, in __init__
self.init()
File "/usr/lib/python2.6/urllib.py", line 860, in init
self.ftp.connect(self.host, self.port, self.timeout)
File "/usr/lib/python2.6/ftplib.py", line 134, in connect
self.welcome = self.getresp()
File "/usr/lib/python2.6/ftplib.py", line 216, in getresp
raise error_temp, resp
urllib2.URLError: <urlopen error ftp error: 421 There are too many connections from your internet address.>
So the first file is loaded and the second fails because the first connection was not closed.
But when i use sleep(1)
after f.close()
the error does not occurr:
loaded 463675266
loaded 463675266
Is there any way to force close the connection so that the second download would not fail?
urllib2 is a Python module that can be used for fetching URLs. It defines functions and classes to help with URL actions (basic and digest. authentication, redirections, cookies, etc) The magic starts with importing the urllib2 module.
The Python 3 standard library has a new urllib which is a merged/refactored/rewritten version of the older modules. urllib3 is a third-party package (i.e., not in CPython's standard library).
NOTE: urllib2 is no longer available in Python 3 You can get more idea about urllib.
request is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the urlopen function. This is capable of fetching URLs using a variety of different protocols.
The cause is indeed a file descriptor leak. We found also that with jython, the problem is much more obvious than with cpython. A colleague proposed this sollution:
fdurl = urllib2.urlopen(req,timeout=self.timeout) realsock = fdurl.fp._sock.fp._sock** # we want to close the "real" socket later req = urllib2.Request(url, header) try: fdurl = urllib2.urlopen(req,timeout=self.timeout) except urllib2.URLError,e: print "urlopen exception", e realsock.close() fdurl.close()
The fix is ugly, but does the job, no more "too many open connections".
Biggie: I think it's because the connection is not shutdown().
Note close() releases the resource associated with a connection but does not necessarily close the connection immediately. If you want to close the connection in a timely fashion, call shutdown() before close().
You could try something like this before f.close():
import socket
f.fp._sock.fp._sock.shutdown(socket.SHUT_RDWR)
(And yes.. if that works, it's not Right(tm), but you'll know what the problem is.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With