I need a Python TCP server that can handle at least tens of thousands of concurrent socket connections. I was trying to test Python SocketServer package capabilities in both multiprocessor and multithreaded modes, but both were far from desired performance.
At first, I'll describe client, because it's common for both cases.
client.py
import socket
import sys
import threading
import time
SOCKET_AMOUNT = 10000
HOST, PORT = "localhost", 9999
data = " ".join(sys.argv[1:])
def client(ip, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((ip, port))
while 1:
sock.sendall(message)
time.sleep(1)
sock.close()
for i in range(SOCKET_AMOUNT):
msg = "test message"
client_thread = threading.Thread(target=client, args=(HOST, PORT, msg))
client_thread.start()
Multiprocessor server:
foked_server.py
import os
import SocketServer
class ForkedTCPRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
cur_process = os.getpid()
print "launching a new socket handler, pid = {}".format(cur_process)
while 1:
self.request.recv(4096)
class ForkedTCPServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):
pass
if __name__ == "__main__":
HOST, PORT = "localhost", 9999
server = ForkedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
print "Starting Forked Server"
server.serve_forever()
Multithreaded server:
threaded_server.py
import threading
import SocketServer
class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
cur_thread = threading.current_thread()
print "launching a new socket handler, thread = {}".format(cur_thread)
while 1:
self.request.recv(4096)
class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
pass
if __name__ == "__main__":
HOST, PORT = "localhost", 9999
server = ThreadedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
print "Starting Threaded Server"
server.serve_forever()
In the first case, with forked_server.py, only 40 processes are created and approximately 20 of those start breaking in a while with the following error:
error: [Errno 104] Connection reset by peer
on a client side.
Threaded version is much more durable and holds more than 4000 connections, but eventually starts showing
gaierror: [Errno -5] No address associated with hostname
The tests were made on my local machine, Kubuntu 14.04 x64 on kernel v3.13.0-32. These are the steps I've made to increase general performance of the system:
sysctl -w fs.file-max=10000000
sysctl -w net.core.netdev_max_backlog = 2500
sysctl -w net.core.somaxconn = 250000
So, the questions are:
Unfortunately,The socket shared by multi-thread is not thread safe. Think about buffer two threads operate on with no lock. The normal way to implement is with two socket,just like what ftp does.
The setsockopt() function provides an application program with the means to control socket behavior. An application program can use setsockopt() to allocate buffer space, control timeouts, or permit socket data broadcasts. The <sys/socket. h> header defines the socket-level options available to setsockopt().
Web Sockets allow a bi-directional channels between client and server whereas REST APIs do not offer a full-duplex connection.
SOCK_STREAM. Provides sequenced, two-way byte streams with a transmission mechanism for stream data. This socket type transmits data on a reliable basis, in order, and with out-of-band capabilities. In the UNIX domain, the SOCK_STREAM socket type works like a pipe.
socketserver
is not going to handle anywhere near 10k connections. No threaded or forked server will on current hardware and OS's. Thousands of threads means you spend more time context-switching and scheduling than actually working. Modern linux is getting very good at scheduling threads and processes, and Windows is pretty good with threads (but horrible with processes), but there's a limit to what it can do.
And socketserver
doesn't even try to be high-performance.
And of course CPython's GIL makes things worse. If you're not using 3.2+; any thread doing even a trivial amount of CPU-bound work is going to choke all of the other threads and block your I/O. With the new GIL, if you avoid non-trivial CPU you don't add too much to the problem, but it still makes context switches more expensive than raw pthreads or Windows threads.
So, what do you want?
You want a single-threaded "reactor" that services events in a loop and kicks off handlers. (On Windows, and Solaris, there are advantages to instead using a "proactor", a pool of threads that all service the same event queue, but since you're on Linux, let's not worry about that.) Modern OS's have very good multiplexing APIs to build on—kqueue
on BSD/Mac, epoll
on Linux, /dev/poll
on Solaris, IOCP on Windows—that can easily handle 10K connections even on hardware from years ago.
socketserver
isn't a terrible reactor, it's just that it doesn't provide any good way to dispatch asynchronous work, only threads or processes. In theory, you could build a GreenletMixIn
(with the greenlet
extension module) or a CoroutineMixIn
(assuming you either have or know how to write a trampoline and scheduler) without too much work on top of socketserver
, and that might not be too heavy-weight. But I'm not sure how much benefit you're getting out of socketserver
at that point.
Parallelism can help, but only to dispatch any slow jobs off the main work thread. First get your 10K connections up, doing minimal work. Then, if the real work you want to add is I/O-bound (e.g., reading files, or making requests to other services), add a pool of threads to dispatch to; if you need to add a lot of CPU-bound work, add a pool of processes instead (or, in some cases, even one of each).
If you can use Python 3.4, the stdlib has an answer in asyncio
(and there's a backport on PyPI for 3.3, but it's inherently impossible to backport to earlier versions).
If not… well, you can build something yourself on top of selectors
in 3.4+ if you don't care about Windows, or select
in 2.6+ if you only care about linux, *BSD, and Mac and are willing to write two versions of your code, but it's going to be a lot of work. Or you can write your core event loop in C (or just use an existing one like libev
or libuv
or libevent
) and wrap it in an extension module.
But really, you probably want to turn to third-party libraries. There are many of them, with very different APIs, from gevent
(which tries to make your code look like preemptively threaded code but actually runs in greenlets on a single-threaded event loop) to Twisted
(which is based around explicit callbacks and futures, similar to many modern JavaScript frameworks).
StackOverflow isn't a good place to get recommendations for specific libraries, but I can give you a general recommendation: Look them over, pick the one whose API sounds best for your application, test whether it's good enough, and only fall back to another one if the one you like can't cut it (or if you turned out to be wrong about liking the API). Fans of some of these libraries (especially gevent
and tornado
will tell you that their favorite is "fastest", but who cares about that? What matters is whether they're fast enough and usable to write your app.
Off the top of my head, I'd search for gevent
, eventlet
, concurrence
, cogen
, twisted
, tornado
, monocle
, diesel
, and circuits
. That probably isn't a great list, but if you google all those terms together, I'll bet you'll find an up-to-date comparison, or an appropriate forum to ask on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With