According to C10k and this paper, throughput of 1-thread-per-connection servers degrade as more and more clients connect and more and more threads are created. According to those two sources, this is because the more threads exist, the more time is spent on context switching compared to actual work done by those threads. Evented servers don't seem to suffer as much from performance degredation at high connection counts.
However, evented servers also do context switches between clients, they just do it in userspace.
I'm mostly interested in how the Linux kernel handles context switching but information about other OSes is welcome too.
In general, the indirect cost of context switch ranges from several microseconds to more than one thousand microseconds for our workload. When the overall data size is larger than cache size, the overhead of refilling of L2 cache have substantial impact on the cost of context switch.
Context Switch is very expensive. Not because of the CPU operation itself, but because of cache invalidation. If you have an intensive task running, it will fill the CPU cache, both for instructions and data, also the memory prefetch, TLB and RAM will optimize the work toward some areas of ram.
Context switching itself has a cost in performance, due to running the task scheduler, TLB flushes, and indirectly due to sharing the CPU cache between multiple tasks.
Thread switching is a type of context switching from one thread to another thread in the same process. Thread switching is very efficient and much cheaper because it involves switching out only identities and resources such as the program counter, registers and stack pointers.
Because the CPU does not need to switch to kernel mode and back to user mode.
Mostly the switch to kernel mode. IIRC, the page tables are the same in kernel mode and user mode in Linux, so at least there is no TLB invalidation penalty.
Needs to be measured and can vary from machine to machine. I guess that a typical desktop/server machine these days can do a few hundred thousands of context switches per second, probably a few million.
Depends on how the kernel scheduler handles this. AFAIK, in Linux it is pretty efficient, even with large thread counts, but more threads means more memory usage means more cache pressure and thus likely lower performance. I also expect some overhead involved in the handling of thousands of sockets.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With