Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is 'epoll' the essential reason that Tornadoweb(or Nginx) is so fast?

Tornadoweb and Nginx are popular web servers for the moment and many benchmarkings show that they have a better performance than Apache under certain circumstances. So my question is:

Is 'epoll' the most essential reason that make them so fast? And what can I learn from that if I want to write a good socket server?

like image 835
Mickey Shine Avatar asked Apr 06 '10 07:04

Mickey Shine


People also ask

Does Nginx use epoll?

epoll is what makes nginx the most popular web server in the world (this blog runs nginx). Here is nginx's use of epoll. and it is often what we mean when we say 'async' in most programming languages.

What is Nginx epoll?

epoll: This is an efficient method of processing connections available on Linux 2.6+. The method is similar to the FreeBSD kqueue. There is also the additional directive epoll_events. This specifies the number of events that NGINX will pass to the kernel. The default value for this is 512.

What is epoll in Linux?

epoll is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5. 44 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them.


2 Answers

If you're looking to write a socket server, a good starting point is Dan Kegel's C10k article from a few years back:

http://www.kegel.com/c10k.html

I also found Beej's Guide to Network Programming to be pretty handy:

http://beej.us/guide/bgnet/

Finally, if you need a great reference, there's UNIX Network Programming by W. Richard Stevens et. al.:

http://www.amazon.com/Unix-Network-Programming-Sockets-Networking/dp/0131411551/ref=dp_ob_title_bk

Anyway, to answer your question, the main difference between Apache and Nginx is that Apache uses one thread per client with blocking I/O, whereas Nginx is single-threaded with non-blocking I/O. Apache's worker pool does reduce the overhead of starting and destorying processes, but it still makes the CPU switch between several threads when serving multiple clients. Nginx, on the other hand, handles all requests in one thread. When one request needs to make a network request (say, to a backend), Nginx attaches a callback to the backend request and then works on another active client request. In practice, this means it returns to the event loop (epoll, kqueue, or select) and asks for file descriptors that have something to report. Note that the system call in main event loop is actually a blocking operation, because there's nothing to do until one of the file descriptors is ready for reading or writing.

So that's the main reason Nginx and Tornado are efficient at serving many simultaneous clients: there's only ever one process (thus saving RAM) and only one thread (thus saving CPU from context switches). As for epoll, it's just a more efficient version of select. If there are N open file descriptors (sockets), it lets you pick out the ones ready for reading in O(1) instead of O(N) time. In fact, Nginx can use select instead of epoll if you compile it with the --with-select_module option, and I bet it will still be more efficient than Apache. I'm not as familiar with Apache internals, but a quick grep shows it does use select and epoll -- probably when the server is listening to multiple ports/interfaces, or if it does simultaneous backend requests for a single client.

Incidentally, I got started with this stuff trying to write a basic socket server and wanted to figure out how Nginx was so freaking efficient. After poring through the Nginx source code and reading those guides/books I linked to above, I discovered it'd be easier to write Nginx modules instead of my own server. Thus was born the now-semi-legendary Emiller's Guide to Nginx Module Development:

http://www.evanmiller.org/nginx-modules-guide.html

(Warning: the Guide was written against Nginx 0.5-0.6 and APIs may have changed.) If you're doing anything with HTTP, I'd say give Nginx a shot because it's worked out all the hairy details of dealing with stupid clients. For example, the small socket server that I wrote for fun worked great with all clients -- except Safari, and I never figured out why. Even for other protocols, Nginx might be the right way to go; the eventing is pretty well abstracted from the protocols, which is why it can proxy HTTP as well as IMAP. The Nginx code base is extremely well-organized and very well-written, with one exception that bears mentioning. I wouldn't follow its lead when it comes to hand-rolling a protocol parser; instead, use a parser generator. I've written some stuff about using a parser generator (Ragel) with Nginx here:

http://www.evanmiller.org/nginx-modules-guide-advanced.html#parsing

All of this was probably more information than you wanted, but hopefully you'll find some of it useful.

like image 81
Emiller Avatar answered Oct 13 '22 22:10

Emiller


Yes and no. While they both use epoll, its technically that they both use an event loop for handling the requests. You can find more information about what event loops are and how they're used at wikipedia.

Check out libevent (used by gevent, generally faster & more stable than tornado) or libev for implementations.

like image 5
Phillip B Oldham Avatar answered Oct 13 '22 21:10

Phillip B Oldham