Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are threads so expensive, that event-driven non-blocking IO is better on benchmarks

I recently started learning about node.js, a javascript library on top of V8 known for its non-blocking IO and incredible speed.

To my understanding, node does not wait for IO to respond, but runs an event loop (similar to a game loop) that keeps checking unfinished operations and continues/completes them as soon as IO responds. Node performance was compared to Apache HTTPD with node being significantly faster while using less memory.

Now if you read about Apache, you learn it uses 1 thread per user, which supposedly slows it down significantly and this is where my question appears:

If you compare threads to what node does internally in its event loop, you start seeing the similarities: Both are abstractions of an unfinished process that waits for a resource to respond, both check if the operation made progress regularly and then don't occupy the CPU for a certain amount of time (at least I think a good blocking API sleeps for a few milliseconds before rechecking).

Now where is that striking, critical difference that makes threads so much worse?

like image 540
Aurelia Avatar asked Oct 28 '12 03:10

Aurelia


1 Answers

The difference here is context switching. Swapping threads by the OS requires:

  • saving the instruction pointer (done by the CPU)
  • saving the CPU register (may not be neccessary if the thread has made a blocking call, but is neccessary if it is preempted)
  • swapping the call stacks. Even if the stacks reside in the same virtual memory space, this is at least one write and some reads and even applies to micro-threads(fibers).
  • in case of swapping to a different process, swapping to kernel mode, updating the virtual memory table and returning to user mode.

In the case of an event queue:

  • the state is updated. This needs to happen in any case.
  • the event handler returns. Instead of swapping the call stacks, the current call stack is popped.
  • the event queue is checked for pending requests. Only if there is no pending request, the application waits. This can be done by sleeping repeatedly (as suggested by the OP) or (better) by making a blocking call to the event queue. If the event queue (e.g. a set of TCP sockets) is managed by the OS, then the OS is responsible for notifying the application on a new event (a socket can accept more data).

If the server is highly loaded, the only overhead of an event queue is a handler return, reading the queue and a handler call. In the threaded approach, there is additional overhead from swapping the threads.

Also, as mentioned by PST, the threaded approach introduces the need for locking. Locking itself is cheap, but waiting for a resource to be released by some other thread requires additional context switching as the waiting thread cannot continue. It is even possible for a thread to be swapped in to acquire a lock only to be swapped out a few clock cycles later because it needs to lock another resource as well. Compare how much is done by the OS (reading the tread queue and swapping call stacks, at the very least) to how much is done by the thread (returning from a call and making another call).

like image 159
John Dvorak Avatar answered Sep 18 '22 16:09

John Dvorak