What's the point in keeping a different kernel stack for each process in linux?
Why not keep just one stack for the kernel to work with?
With separate user and kernel stacks for each process or thread, we have better isolation. Problems in the user stack can't cause a crash in the kernel. This isolation makes the kernel more secure because it only trusts the stack area that is under its control.
Each process has a kernel stack (or more generally, each thread has its own stack) Just like there has to be a separate place for each process to hold its set of saved registers (in its process table entry), each process also needs its own kernel stack, to work as its execution stack when it is executing in the kernel.
There is one "kernel stack" per CPU like ISR Stack and one "kernel stack" per Process. There is one "user stack" for each process, though each thread has its own stack, including both user and kernel threads.
One of the reasons for having a separate kernel stack is that the kernel needs a place to store information where user-mode code can't touch it. That prevents user-mode code running in a different thread/process from accidentally or maliciously affecting execution of the kernel.
What's the point in keeping a different kernel stack for each process in linux?
It simplifies pre-emption of processes in the kernel space.
Why not keep just one stack for the kernel to work with?
It would be a night mare to implement pre-emption without seperates stacks.
Separate kernel stacks are not really mandated. Each architecture is free to do whatever it wants. If there was no per-emption during a system call, then a single kernel stack might make sense.
However, *nix has processes and each process can make a system call. However, Linux allows one task to be pre-empted during a write()
, etc and another task to schedule. The kernel stack is a snapshot of the context of kernel work that is being performed for each process.
Also, the per-process kernel stacks come with little overhead. A thread_info
or some mechanism to get the process information from assembler is needed. This is at least a page allocation. By placing the kernel mode stack in the same location, a simple mask can get the thread_info
from assembler. So, we already need the per-process variable and allocation. Why not use it as a stack to store kernel context and allow preemption during system calls?
The efficiency of preemption can be demonstrated by mentioned write
above. If the write()
is to disk or network, it will take time to complete. A 5k to 8k buffer written to disk or network will take many CPU cycles to complete (if synchronous) and the user process will block until it is finished. This transfer in the driver can be done with DMA. Here, a hardware element will complete the task of transferring the buffer to the device. In the mean time, a lower priority process can have the CPU and be allowed to make system calls when the kernel keeps different stacks per process. These stacks are near zero cost as the kernel already needs to have book keeping information for process state and the two are both keep in an 4k or 8k page.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With