Python socket stress concurrency

Tags:

I need a Python TCP server that can handle at least tens of thousands of concurrent socket connections. I was trying to test Python SocketServer package capabilities in both multiprocessor and multithreaded modes, but both were far from desired performance.

At first, I'll describe client, because it's common for both cases.

client.py

import socket
import sys
import threading
import time


SOCKET_AMOUNT = 10000
HOST, PORT = "localhost", 9999
data = " ".join(sys.argv[1:])


def client(ip, port, message):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((ip, port))
    while 1:
        sock.sendall(message)
        time.sleep(1)
    sock.close()


for i in range(SOCKET_AMOUNT):
    msg = "test message"
    client_thread = threading.Thread(target=client, args=(HOST, PORT, msg))
    client_thread.start()

Multiprocessor server:

foked_server.py

import os
import SocketServer


class ForkedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
        cur_process = os.getpid()
        print "launching a new socket handler, pid = {}".format(cur_process)
        while 1:
            self.request.recv(4096)


class ForkedTCPServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):
    pass


if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    server = ForkedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
    print "Starting Forked Server"
    server.serve_forever()

Multithreaded server:

threaded_server.py

import threading
import SocketServer


class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
        cur_thread = threading.current_thread()
        print "launching a new socket handler, thread = {}".format(cur_thread)
        while 1:
            self.request.recv(4096)


class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    pass


if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    server = ThreadedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
    print "Starting Threaded Server"
    server.serve_forever()

In the first case, with forked_server.py, only 40 processes are created and approximately 20 of those start breaking in a while with the following error:

error: [Errno 104] Connection reset by peer

on a client side.

Threaded version is much more durable and holds more than 4000 connections, but eventually starts showing

gaierror: [Errno -5] No address associated with hostname

The tests were made on my local machine, Kubuntu 14.04 x64 on kernel v3.13.0-32. These are the steps I've made to increase general performance of the system:

Raise kernel limit on file handles: sysctl -w fs.file-max=10000000
Increase the connection backlog, sysctl -w net.core.netdev_max_backlog = 2500
Raise the maximum connections, sysctl -w net.core.somaxconn = 250000

So, the questions are:

Were the tests correct, can I rely on those results? I'm new to all this Network/Socket stuff, so please correct me in my conclusions.
Is it really the multiprocessor/multithreaded approach not viable in a heavy loaded systems?
If yes, what options do we have left? Asynchronous approach? Tornado/Twisted/Gevent frameworks?

962

asked Sep 11 '14 05:09

Serge Mosin

1 Answers

socketserver is not going to handle anywhere near 10k connections. No threaded or forked server will on current hardware and OS's. Thousands of threads means you spend more time context-switching and scheduling than actually working. Modern linux is getting very good at scheduling threads and processes, and Windows is pretty good with threads (but horrible with processes), but there's a limit to what it can do.

And socketserver doesn't even try to be high-performance.

And of course CPython's GIL makes things worse. If you're not using 3.2+; any thread doing even a trivial amount of CPU-bound work is going to choke all of the other threads and block your I/O. With the new GIL, if you avoid non-trivial CPU you don't add too much to the problem, but it still makes context switches more expensive than raw pthreads or Windows threads.

So, what do you want?

You want a single-threaded "reactor" that services events in a loop and kicks off handlers. (On Windows, and Solaris, there are advantages to instead using a "proactor", a pool of threads that all service the same event queue, but since you're on Linux, let's not worry about that.) Modern OS's have very good multiplexing APIs to build on—kqueue on BSD/Mac, epoll on Linux, /dev/poll on Solaris, IOCP on Windows—that can easily handle 10K connections even on hardware from years ago.

socketserver isn't a terrible reactor, it's just that it doesn't provide any good way to dispatch asynchronous work, only threads or processes. In theory, you could build a GreenletMixIn (with the greenlet extension module) or a CoroutineMixIn (assuming you either have or know how to write a trampoline and scheduler) without too much work on top of socketserver, and that might not be too heavy-weight. But I'm not sure how much benefit you're getting out of socketserver at that point.

Parallelism can help, but only to dispatch any slow jobs off the main work thread. First get your 10K connections up, doing minimal work. Then, if the real work you want to add is I/O-bound (e.g., reading files, or making requests to other services), add a pool of threads to dispatch to; if you need to add a lot of CPU-bound work, add a pool of processes instead (or, in some cases, even one of each).

If you can use Python 3.4, the stdlib has an answer in asyncio (and there's a backport on PyPI for 3.3, but it's inherently impossible to backport to earlier versions).

If not… well, you can build something yourself on top of selectors in 3.4+ if you don't care about Windows, or select in 2.6+ if you only care about linux, *BSD, and Mac and are willing to write two versions of your code, but it's going to be a lot of work. Or you can write your core event loop in C (or just use an existing one like libev or libuv or libevent) and wrap it in an extension module.

But really, you probably want to turn to third-party libraries. There are many of them, with very different APIs, from gevent (which tries to make your code look like preemptively threaded code but actually runs in greenlets on a single-threaded event loop) to Twisted (which is based around explicit callbacks and futures, similar to many modern JavaScript frameworks).

StackOverflow isn't a good place to get recommendations for specific libraries, but I can give you a general recommendation: Look them over, pick the one whose API sounds best for your application, test whether it's good enough, and only fall back to another one if the one you like can't cut it (or if you turned out to be wrong about liking the API). Fans of some of these libraries (especially gevent and tornado will tell you that their favorite is "fastest", but who cares about that? What matters is whether they're fast enough and usable to write your app.

Off the top of my head, I'd search for gevent, eventlet, concurrence, cogen, twisted, tornado, monocle, diesel, and circuits. That probably isn't a great list, but if you google all those terms together, I'll bet you'll find an up-to-date comparison, or an appropriate forum to ask on.

133

answered Sep 28 '22 09:09

abarnert

Related questions
                            
                                How to flatten a list with various data types (int, tuple)
                            
                                OSX Pillow Incompatible library version libtiff.5.dylib & libjpeg.8.dylib
                            
                                How do I remove \n from my python dictionary?
                            
                                Sort a pandas datetime index
                            
                                How do you iterate through a gzipped carriage-return file using python 2.7?
                            
                                Find number of zeros before non-zero in a numpy array
                            
                                How to extract 128x128 icon bitmap data from EXE in python
                            
                                Python profiling - What are the columns in runsnakerun output?
                            
                                Split DatetimeIndex into date and time MultiIndex conveniently in Pandas
                            
                                How to create an activity plot from Pandas Dataframe (like the Github contribution plot)
                            
                                Python : Adding a code routine at each line of a block of code
                            
                                How to get IP address of the launched instance with Boto
                            
                                Celery dies with DBPageNotFoundError
                            
                                sqlalchemy, mixins, foreignkeys and declared_attr
                            
                                Subtract subgroup averages from individuals without resorting to for loop
                            
                                Efficiently grouping a list of coordinates points by location in Python
                            
                                Matplotlib timelines
                            
                                ImportError: "No modules named". But modules already installed in dist-packages
                            
                                Generating random numbers with a given probability density function
                            
                                How to download .gz files with requests in Python without decoding it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python socket stress concurrency

Tags:

python

multithreading

sockets

Serge Mosin

People also ask

1 Answers

abarnert

Recent Activity

Donate For Us