It seems I cannot get the urllib2
timeout to be taken into account.
I did read - I suppose - all posts related to this topic and it seems I'm not doing anything wrong. Am I correct?
Many thanks for your kind help.
Scenario:
I need to check for Internet connectivity before continuing with the remaining of a script. I then wrote a function (Net_Access), which is provided below.
Some info:
cat /proc/sys/kernel/osrelease: 2.6.32-42-generic
My code:
#!/usr/bin/env python
import socket
import urllib2
myhost = 'http://www.google.com'
timeout = 3
socket.setdefaulttimeout(timeout)
req = urllib2.Request(myhost)
try:
handle = urllib2.urlopen(req, timeout = timeout)
except urllib2.URLError as e:
socket.setdefaulttimeout(None)
print ('[--- Net_Access() --- No network access')
else:
print ('[--- Net_Access() --- Internet Access OK')
1) Working, with LAN connector plugged in
$ $ time ./Net_Access
[--- Net_Access() --- Internet Access OK
real 0m0.223s
user 0m0.060s
sys 0m0.032s
2) Timeout not working, with LAN connector unplugged
$ time ./Net_Access
[--- Net_Access() --- No network access
real 1m20.235s
user 0m0.048s
sys 0m0.060s
Added to original post: test results (using IP instead of FQDN)
As suggested by @unutbu (see comments) replacing the FQDN in myhost with an IP address fixes the problem: the timeout is taken into effect.
LAN connector plugged in...
$ time ./Net_Access
[--- Net_Access() --- Internet Access OK
real 0m0.289s
user 0m0.036s
sys 0m0.040s
LAN connector unplugged...
$ time ./Net_Access
[--- Net_Access() --- No network access
real 0m3.082s
user 0m0.052s
sys 0m0.024s
This is nice, but it means that timeout could only be used with IP and not FQDN. Weird...
Did someone found a way to use urllib2 timeout without getting into pre-DNS resolution and pass IP to the function, or are you first using socket to test connection and then fire urllib2 when you are sure that you can reach the target?
Many thanks.
If your problem is with DNS lookup taking forever (or just way too long) to time out when there's no network connectivity, then yes, this is a known problem, and there's nothing you can do within urllib2
itself to fix that.
So, is all hope lost? Well, not necessarily.
First, let's look at what's going on. Ultimately, urlopen
relies on getaddrinfo
, which (along with its relatives like gethostbyname
) is notoriously the one critical piece of the socket API that can't be run asynchronously or interrupted (and on some platforms, it's not even thread-safe). If you want to trace through the source yourself, urllib2
defers to httplib
for creating connections, which calls create_connection
on socket
, which calls socket_getaddrinfo
on _socket
, which ultimately calls the real getaddrinfo
function. This is an infamous problem that affects every network client or server written in every language in the world, and there's no good, easy solution.
One option is to use a different higher-level library that's already solved this problem. I believe requests
relies on urllib3
which ultimately has the same problem, but pycurl
relies on libcurl
, which, if built with c-ares
, does name lookup asynchronously, and therefore can time it out.
Or, of course, you can use something like twisted
or tornado
or some other async networking library. But obviously rewriting all of your code to use a twisted
HTTP client instead of urllib2
is not exactly trivial.
Another option is to "fix" urllib2
by monkeypatching the standard library. If you want to do this, there are two steps.
First, you have to provide a timeoutable getaddrinfo
. You could do this by binding c-ares
, or using ctypes
to access platform-specific APIs like linux's getaddrinfo_a
, or even looking up the nameservers and communicating with them directly. But the really simple way to do it is to use threading. If you're doing lots of these, you'll want to use a single thread or small threadpool, but for small-scale use, just spin off a thread for each call. A really quick-and-dirty (read: bad) implementation is:
def getaddrinfo_async(*args):
result = None
t = threading.Thread(target=lambda: result=socket.getaddrinfo(*args))
t.start()
t.join(timeout)
if t.isAlive():
raise TimeoutError(blahblahblah)
return result
Next, you have to get all the libraries you care about to use this. Depending on how ubiquitous (and dangerous) you want your patch to be, you can replace socket.getaddrinfo
itself, or just socket.create_connection
, or just the code in httplib
or even urllib2
.
A final option is to fix this at a higher level. If your networking stuff is happening on a background thread, you can throw a higher-level timeout on the whole thing, and if it took more than timeout
seconds to figure out whether it's timed out or not, you know it has.
Perhaps try this:
import urllib2
def get_header(url):
req = urllib2.Request(url)
req.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(req)
except urllib2.URLError:
# urllib2.URLError: <urlopen error [Errno -2] Name or service not known>
return False
return True
url = 'http://www.kernel.org/pub/linux/kernel/v3.0/linux-3.7.1.tar.bz2'
print(get_header(url))
When I unplug my network adapter, this prints False almost immediately, while under normal conditions this prints True.
I'm not sure why this works so quickly compared to your original code (even without needing to set the timeout parameter), but perhaps it will work for you too.
I did an experiment this morning which did result in get_header
not returning immediately. I booted the computer with the router off. Then the router was turned on. Then networking and wireless was enabled through the Ubuntu GUI. This failed to establish a working connection. At this stage, get_header
failed to return immediately.
So, here is a heavier-weight solution which calls get_header
in a subprocess using multiprocessing.Pool
. The object returned by pool.apply_async
has a get
method with a timeout parameter. If a result is not returned from get_header
within the duration specified by timeout
, the subprocess is terminated.
Thus, check_http
should return a result within about 1 second, under all circumstances.
import multiprocessing as mp
import urllib2
def timeout_function(cmd, timeout = None, args = (), kwds = {}):
pool = mp.Pool(processes = 1)
result = pool.apply_async(cmd, args = args, kwds = kwds)
try:
retval = result.get(timeout = timeout)
except mp.TimeoutError as err:
pool.terminate()
pool.join()
raise
else:
return retval
def get_header(url):
req = urllib2.Request(url)
req.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(req)
except urllib2.URLError:
return False
return True
def check_http(url):
try:
response = timeout_function(
get_header,
args = (url, ),
timeout = 1)
return response
except mp.TimeoutError:
return False
print(check_http('http://www.google.com'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With