I'm investigating a problem with a Python app running on an Ubuntu machine with 4G of RAM. The tool will be used to audit servers (we prefer to roll our own tools). It uses threads to connect to lots of servers and many of the TCP connections fail. However, if I add a delay of 1 second between kicking off each thread then most connections succeed. I have used this simple script to investigate what may be happening:
#!/usr/bin/python
import sys
import socket
import threading
import time
class Scanner(threading.Thread):
def __init__(self, host, port):
threading.Thread.__init__(self)
self.host = host
self.port = port
self.status = ""
def run(self):
self.sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.sk.settimeout(20)
try:
self.sk.connect((self.host, self.port))
except Exception, err:
self.status = str(err)
else:
self.status = "connected"
finally:
self.sk.close()
def get_hostnames_list(filename):
return open(filename).read().splitlines()
if (__name__ == "__main__"):
hostnames_file = sys.argv[1]
hosts_list = get_hostnames_list(hostnames_file)
threads = []
for host in hosts_list:
#time.sleep(1)
thread = Scanner(host, 443)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
print "Host: ", thread.host, " : ", thread.status
If I run this with the time.sleep(1) commented out against, say, 300 hosts many of the connections fail with a timeout error, whereas they don't timeout if I put the delay of one second in. I did try the app on another Linux distro running on a more powerful machine and there weren't as many connect errors? Is it due to a kernel limitation? Is there anything I can do to get the connection to work without putting in the delay?
UPDATE
I have also tried a program that limited the number of threads available in a pool. By reducing this down to 20 I can get all connects to work, but it only checks about 1 host a second. So whatever I try (putting in a sleep(1) or limiting the number of concurrent threads) I don't seem to able to check more than 1 host every second.
UPDATE
I just found this question which seems similar to what I am seeing.
UPDATE
I wonder if writing this using twisted might help? Could anyone show what my example would look like written using twisted?
Python 3.4 introduces new provisional API for asynchronous IO -- asyncio
module.
This approach is similar to twisted
-based answer:
#!/usr/bin/env python3.4
import asyncio
import logging
from contextlib import closing
class NoopProtocol(asyncio.Protocol):
def connection_made(self, transport):
transport.close()
info = logging.getLogger().info
@asyncio.coroutine
def connect(loop, semaphor, host, port=443, ssl=True, timeout=15):
try:
with (yield from semaphor):
info("connecting %s" % host)
done, pending = yield from asyncio.wait(
[loop.create_connection(NoopProtocol, host, port, ssl=ssl)],
loop=loop, timeout=timeout)
if done:
next(iter(done)).result()
except Exception as e:
info("error %s reason: %s" % (host, e))
else:
if pending:
info("error %s reason: timeout" % (host,))
for ft in pending:
ft.cancel()
else:
info("done %s" % host)
@asyncio.coroutine
def main(loop):
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
limit, timeout, hosts = parse_cmdline()
# connect `limit` concurrent connections
sem = asyncio.BoundedSemaphore(limit)
coros = [connect(loop, sem, host, timeout=timeout) for host in hosts]
if coros:
yield from asyncio.wait(coros, loop=loop)
if __name__=="__main__":
with closing(asyncio.get_event_loop()) as loop:
loop.run_until_complete(main(loop))
As well as twisted
variant it uses NoopProtocol
that does nothing but disconnects immediately on successful connection.
Number of concurrent connections is limited using a semaphore.
The code is coroutine-based.
To find out how many successful ssl connections we can make to the first 1000 hosts from top million Alexa list:
$ curl -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
$ unzip *.zip
$ /usr/bin/time perl -nE'say $1 if /\d+,([^\s,]+)$/' top-1m.csv | head -1000 |\
python3.4 asyncio_ssl.py - --timeout 60 |& tee asyncio.log
The result is less than half of all connections are successful. On average, it checks ~20 hosts per second. Many sites timed out after a minute. If host doesn't match hostnames from server's certificate then the connection also fails. It includes example.com
vs. www.example.com
-like comparisons.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With