Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with multi threaded Python app and socket connections

I'm investigating a problem with a Python app running on an Ubuntu machine with 4G of RAM. The tool will be used to audit servers (we prefer to roll our own tools). It uses threads to connect to lots of servers and many of the TCP connections fail. However, if I add a delay of 1 second between kicking off each thread then most connections succeed. I have used this simple script to investigate what may be happening:

#!/usr/bin/python

import sys
import socket
import threading
import time

class Scanner(threading.Thread):
    def __init__(self, host, port):
        threading.Thread.__init__(self)
        self.host = host
        self.port = port
        self.status = ""

    def run(self):
        self.sk = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        self.sk.settimeout(20)
        try:
            self.sk.connect((self.host, self.port))
        except Exception, err:
            self.status = str(err)
        else:
            self.status = "connected"
        finally:
            self.sk.close()


def get_hostnames_list(filename):
    return open(filename).read().splitlines()

if (__name__ == "__main__"):
    hostnames_file = sys.argv[1]
    hosts_list = get_hostnames_list(hostnames_file)
    threads = []
    for host in hosts_list:
        #time.sleep(1)
        thread = Scanner(host, 443)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()
        print "Host: ", thread.host, " : ", thread.status

If I run this with the time.sleep(1) commented out against, say, 300 hosts many of the connections fail with a timeout error, whereas they don't timeout if I put the delay of one second in. I did try the app on another Linux distro running on a more powerful machine and there weren't as many connect errors? Is it due to a kernel limitation? Is there anything I can do to get the connection to work without putting in the delay?

UPDATE

I have also tried a program that limited the number of threads available in a pool. By reducing this down to 20 I can get all connects to work, but it only checks about 1 host a second. So whatever I try (putting in a sleep(1) or limiting the number of concurrent threads) I don't seem to able to check more than 1 host every second.

UPDATE

I just found this question which seems similar to what I am seeing.

UPDATE

I wonder if writing this using twisted might help? Could anyone show what my example would look like written using twisted?

like image 940
VacuumTube Avatar asked Jan 24 '11 15:01

VacuumTube


1 Answers

Python 3.4 introduces new provisional API for asynchronous IO -- asyncio module.

This approach is similar to twisted-based answer:

#!/usr/bin/env python3.4
import asyncio
import logging
from contextlib import closing

class NoopProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        transport.close()

info = logging.getLogger().info

@asyncio.coroutine
def connect(loop, semaphor, host, port=443, ssl=True, timeout=15):
    try:
        with (yield from semaphor):
            info("connecting %s" % host)
            done, pending = yield from asyncio.wait(
                [loop.create_connection(NoopProtocol, host, port, ssl=ssl)],
                loop=loop, timeout=timeout)
            if done:
                next(iter(done)).result()
    except Exception as e:
        info("error %s reason: %s" % (host, e))
    else:
        if pending:
            info("error %s reason: timeout" % (host,))
            for ft in pending:
                ft.cancel()
        else:
            info("done %s" % host)

@asyncio.coroutine
def main(loop):
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(message)s")
    limit, timeout, hosts = parse_cmdline()

    # connect `limit` concurrent connections
    sem = asyncio.BoundedSemaphore(limit)
    coros = [connect(loop, sem, host, timeout=timeout) for host in hosts]
    if coros:
        yield from asyncio.wait(coros, loop=loop)

if __name__=="__main__":
    with closing(asyncio.get_event_loop()) as loop:
        loop.run_until_complete(main(loop))

As well as twisted variant it uses NoopProtocol that does nothing but disconnects immediately on successful connection.

Number of concurrent connections is limited using a semaphore.

The code is coroutine-based.

Example

To find out how many successful ssl connections we can make to the first 1000 hosts from top million Alexa list:

$ curl -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
$ unzip *.zip
$ /usr/bin/time perl -nE'say $1 if /\d+,([^\s,]+)$/' top-1m.csv | head -1000 |\
    python3.4 asyncio_ssl.py - --timeout 60 |& tee asyncio.log

The result is less than half of all connections are successful. On average, it checks ~20 hosts per second. Many sites timed out after a minute. If host doesn't match hostnames from server's certificate then the connection also fails. It includes example.com vs. www.example.com -like comparisons.

like image 59
jfs Avatar answered Nov 14 '22 01:11

jfs