Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing a socket-based server in Python, recommended strategies?

I was recently reading this document which lists a number of strategies that could be employed to implement a socket server. Namely, they are:

  1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
  2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification
  3. Serve many clients with each server thread, and use asynchronous I/O
  4. serve one client with each server thread, and use blocking I/O
  5. Build the server code into the kernel

Now, I would appreciate a hint on which should be used in CPython, which we know has some good points, and some bad points. I am mostly interested in performance under high concurrency, and yes a number of the current implementations are too slow.

So if I may start with the easy one, "5" is out, as I am not going to be hacking anything into the kernel.

"4" Also looks like it must be out because of the GIL. Of course, you could use multiprocessing in place of threads here, and that does give a significant boost. Blocking IO also has the advantage of being easier to understand.

And here my knowledge wanes a bit:

"1" is traditional select or poll which could be trivially combined with multiprocessing.

"2" is the readiness-change notification, used by the newer epoll and kqueue

"3" I am not sure there are any kernel implementations for this that have Python wrappers.

So, in Python we have a bag of great tools like Twisted. Perhaps they are a better approach, though I have benchmarked Twisted and found it too slow on a multiple processor machine. Perhaps having 4 twisteds with a load balancer might do it, I don't know. Any advice would be appreciated.

like image 424
Ali Afshar Avatar asked Mar 11 '09 11:03

Ali Afshar


People also ask

How do I create a socket server in Python?

Python Socket Server To use python socket connection, we need to import socket module. Then, sequentially we need to perform some task to establish connection between server and client. We can obtain host address by using socket. gethostname() function.

Which method is used to create server side socket in Python?

socket() method is used to create a server-side socket. NOTE: AF_INET refers to Address from the Internet and it requires a pair of (host, port) where the host can either be a URL of some particular website or its address and the port number is an integer. SOCK_STREAM is used to create TCP Protocols.

Is Python good for socket?

Python provides two levels of access to network services. At a low level, you can access the basic socket support in the underlying operating system, which allows you to implement clients and servers for both connection-oriented and connectionless protocols.


2 Answers

asyncore is basically "1" - It uses select internally, and you just have one thread handling all requests. According to the docs it can also use poll. (EDIT: Removed Twisted reference, I thought it used asyncore, but I was wrong).

"2" might be implemented with python-epoll (Just googled it - never seen it before). EDIT: (from the comments) In python 2.6 the select module has epoll, kqueue and kevent build-in (on supported platforms). So you don't need any external libraries to do edge-triggered serving.

Don't rule out "4", as the GIL will be dropped when a thread is actually doing or waiting for IO-operations (most of the time probably). It doesn't make sense if you've got huge numbers of connections of course. If you've got lots of processing to do, then python may not make sense with any of these schemes.

For flexibility maybe look at Twisted?

In practice your problem boils down to how much processing you are going to do for requests. If you've got a lot of processing, and need to take advantage of multi-core parallel operation, then you'll probably need multiple processes. On the other hand if you just need to listen on lots of connections, then select or epoll, with a small number of threads should work.

like image 185
Douglas Leeder Avatar answered Oct 14 '22 04:10

Douglas Leeder


How about "fork"? (I assume that is what the ForkingMixIn does) If the requests are handled in a "shared nothing" (other than DB or file system) architecture, fork() starts pretty quickly on most *nixes, and you don't have to worry about all the silly bugs and complications from threading.

Threads are a design illness forced on us by OSes with too-heavy-weight processes, IMHO. Cloning a page table with copy-on-write attributes seems a small price, especially if you are running an interpreter anyway.

Sorry I can't be more specific, but I'm more of a Perl-transitioning-to-Ruby programmer (when I'm not slaving over masses of Java at work)


Update: I finally did some timings on thread vs fork in my "spare time". Check it out:

http://roboprogs.com/devel/2009.04.html

Expanded: http://roboprogs.com/devel/2009.12.html

like image 35
Roboprog Avatar answered Oct 14 '22 04:10

Roboprog