Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does event driven I/O allow multiprocessing?

I am aware of event driven I/O like select, poll, epoll, etc allow someone to build say a highly scalable web server, but I am confused by the details. If there is only one thread of execution and one process running for the server, then when the server is running its "processing" routine for the ready clients, isn't this done in a serial fashion to process the list of ready clients since it can't be scheduled on multiple cores or cpus? Moreover, when this processing is happening...wouldn't the server be unresponsive?

I used to think this was the reason people used thread pools to handle the event I/O on the backend, but I was confused when I heard recently that not everybody uses thread pools for their applications.

like image 348
Boris Yeltz Avatar asked Jul 12 '10 18:07

Boris Yeltz


2 Answers

Hmmm. You (the original poster) and the other answers are, I think, coming at this backwards.

You seem to grasp the event-driven part, but are getting hung up on what happens after an event fires.

The key thing to understand is that a web server generally spends very little time "processing" a request, and a whole lot of time waiting for disk and network I/O.

When a request comes in, there are generally one of two things that the server needs to do. Either load a file and send it to the client, or pass the request to something else (classically, a CGI script, these days FastCGI is more common for obvious reasons).

In either case, the server's job is computationally minimal, it's just a middle-man between the client and the disk or "something else".

That's why these servers use what is called non-blocking I/O.

The exact mechanisms vary from one operating system to another, but the key point is that a read or write request always returns instantly (or near enough). When you try to write, for example, to a socket, the system either immediately accepts what it can into a buffer, or returns something like an EWOULDBLOCK error letting you know it can't take more data right now.

Once the write has been "accepted", the program can make a note of the state of the connection (e.g. "5000 of 10000 bytes sent" or something) and move on to the next connection which is ready for action, coming back to the first after the system is ready to take more data.

This is unlike a normal blocking socket where a big write request could block for quite a while as the OS tries to send data over the network to the client.

In a sense, this isn't really different from what you might do with threaded I/O, but it has much reduced overhead in the form of memory, context switching, and general "housekeeping", and takes maximum advantage of what operating systems do best (or are supposed to, anyway): handle I/O quickly.

As for multi-processor/multi-core systems, the same principles apply. This style of server is still very efficient on each individual CPU. You just need one that will fork multiple instances of itself to take advantage of the additional processors.

like image 177
Nicholas Knight Avatar answered Sep 30 '22 07:09

Nicholas Knight


Some of that wisdom predates general availability of multi-core systems. In a multitasking environment, that's still true. Only except for your portable electronics, most of the machines you touch are multiprocessing these days. And even that may not hold for long.

In a pure multi-tasking system, all the OS does is hop from one job to another as they become runnable (unblocked). Event driven and non-blocking IO just do the same thing in userspace.

For certain tasks, it can still aid multiprocessing. By reducing thread yields and mutually exclusive code, more processors can run the application for more clock cycles.

For instance, in an IDE you don't want it constantly scanning the filesystem for external changes. If you've been around long, you've probably run into that before and it's irritating/unproductive. It wastes resources and causes global data models to become locked/unresponsive during updates. Setting an IO Event listener ('watch' on the directory) frees the application to do other things, like helping you write code.

like image 32
Jason Avatar answered Sep 30 '22 05:09

Jason