Possible Duplicate:
Technically why is processes in Erlang more efficient than OS threads?
Any time Erlang processes or green threads or coroutines are mentioned, they are always described as "light weight" when compared to kernel threads. The reason usually given is that kernel threads involve context switching which is slow.
But what exactly about context switching that is slow? And how much slower is it compared to switching green threads in userland?
Is context switching the main (only?) factor that accounts for the difference in performance and memory consumption between an event-driven program such as Nignx and a multi-processing program such as Apache?
Context switching on preemptive, monolithic, multitasking operating systems involves one of two paths, either an implicit yield to the scheduler via some system service call (sleep, mutex acquire, waiting for an event, blocking I/O) or via an interrupt and the scheduler deciding to swap running tasks.
When a task is swapped by the scheduler, a few heavyweight things happen:
- All the action happens as part of the operating system kernel, running in a high level of privilege. Every action is checked (or it should be) to ensure that decisions made by the scheduler don't grant a task any additional privileges (think local root exploit)
- User-mode process address spaces are swapped. This results in the memory manager dorking around with the page table layouts and loading a new directory base into a control register.
- This also means that data kept in the CPU cache could be shot down and purged. This would suck if your task had just accessed a bunch of frequently used stuff, then context switched and "lost" it (on next access it would [probably] have to be fetched from main memory again)
- Depending on how you trap into the kernel, you then need to trap OUT of the kernel. If you make a system call, for example, the CPU will go through a very precise set of steps to transition to code running in the kernel, and then on exit, unwind those steps. These steps are more complicated than making a function call to another module in your program, so they take more time.
A green thread task is pretty straightforward, as I understand it. A user-mode dispatcher directs a coroutine to run until the coroutine yields. A few differences in above:
- None of the dispatching of coroutines happens in kernel mode, indeed dispatching green threads generally does not need to involve any operating system services, or any blocking operating system services. So all of this can happen without any context switches or any user/kernel translation.
- A green thread isn't preemptively scheduled, or even preempted at all by the green thread manager, they are scheduled co-operatively. This is good and bad, but with well written routines, generally good. Each task does precisely what it needs to do and then traps back to the dispatcher, but without any context swap overhead.
- Green threads share their address space (as far as I know). No swapping of the address space happens on context switches. Stacks are (as far as I know) swapped, but those stacks are managed by the dispatcher and also swapping a stack is a simple write into a register. Swapping a stack is also a nonpriviliged operation.
In short, a context switch in user mode involves a few library calls and writing to the stack pointer register. A context switch in kernel mode involves interrupts, user/kernel transition, and system level behavior like address space state changes and cache flushes.