Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Context switch internals

I want to learn and fill gaps in my knowledge with the help of this question.

So, a user is running a thread (kernel-level) and it now calls yield (a system call I presume). The scheduler must now save the context of the current thread in the TCB (which is stored in the kernel somewhere) and choose another thread to run and loads its context and jump to its CS:EIP. To narrow things down, I am working on Linux running on top of x86 architecture. Now, I want to get into the details:

So, first we have a system call:

1) The wrapper function for yield will push the system call arguments onto the stack. Push the return address and raise an interrupt with the system call number pushed onto some register (say EAX).

2) The interrupt changes the CPU mode from user to kernel and jumps to the interrupt vector table and from there to the actual system call in the kernel.

3) I guess the scheduler gets called now and now it must save the current state in the TCB. Here is my dilemma. Since, the scheduler will use the kernel stack and not the user stack for performing its operation (which means the SS and SP have to be changed) how does it store the state of the user without modifying any registers in the process. I have read on forums that there are special hardware instructions for saving state but then how does the scheduler get access to them and who runs these instructions and when?

4) The scheduler now stores the state into the TCB and loads another TCB.

5) When the scheduler runs the original thread, the control gets back to the wrapper function which clears the stack and the thread resumes.

Side questions: Does the scheduler run as a kernel-only thread (i.e. a thread which can run only kernel code)? Is there a separate kernel stack for each kernel-thread or each process?

like image 437
Bruce Avatar asked Sep 27 '12 21:09

Bruce


People also ask

What are the parts of context switching?

Interrupt handling. Context Switching involves storing the context or state of a process so that it can be reloaded when required and execution can be resumed from the same point as earlier.

How do context switches work?

A context switching is a process that involves switching of the CPU from one process or task to another. In this phenomenon, the execution of the process that is present in the running state is suspended by the kernel and another process that is present in the ready state is executed by the CPU.

What happens internally during context switch in Linux kernel?

When a thread context-switches, it calls into the scheduler (the scheduler does not run as a separate thread - it always runs in the context of the current thread). The scheduler code selects a process to run next, and calls the switch_to() function.

What is a context switch CPU?

A context switch is a procedure that a computer's CPU (central processing unit) follows to change from one task (or process) to another while ensuring that the tasks do not conflict. Effective context switching is critical if a computer is to provide user-friendly multitasking.


2 Answers

At a high level, there are two separate mechanisms to understand. The first is the kernel entry/exit mechanism: this switches a single running thread from running usermode code to running kernel code in the context of that thread, and back again. The second is the context switch mechanism itself, which switches in kernel mode from running in the context of one thread to another.

So, when Thread A calls sched_yield() and is replaced by Thread B, what happens is:

  1. Thread A enters the kernel, changing from user mode to kernel mode;
  2. Thread A in the kernel context-switches to Thread B in the kernel;
  3. Thread B exits the kernel, changing from kernel mode back to user mode.

Each user thread has both a user-mode stack and a kernel-mode stack. When a thread enters the kernel, the current value of the user-mode stack (SS:ESP) and instruction pointer (CS:EIP) are saved to the thread's kernel-mode stack, and the CPU switches to the kernel-mode stack - with the int $80 syscall mechanism, this is done by the CPU itself. The remaining register values and flags are then also saved to the kernel stack.

When a thread returns from the kernel to user-mode, the register values and flags are popped from the kernel-mode stack, then the user-mode stack and instruction pointer values are restored from the saved values on the kernel-mode stack.

When a thread context-switches, it calls into the scheduler (the scheduler does not run as a separate thread - it always runs in the context of the current thread). The scheduler code selects a process to run next, and calls the switch_to() function. This function essentially just switches the kernel stacks - it saves the current value of the stack pointer into the TCB for the current thread (called struct task_struct in Linux), and loads a previously-saved stack pointer from the TCB for the next thread. At this point it also saves and restores some other thread state that isn't usually used by the kernel - things like floating point/SSE registers. If the threads being switched don't share the same virtual memory space (ie. they're in different processes), the page tables are also switched.

So you can see that the core user-mode state of a thread isn't saved and restored at context-switch time - it's saved and restored to the thread's kernel stack when you enter and leave the kernel. The context-switch code doesn't have to worry about clobbering the user-mode register values - those are already safely saved away in the kernel stack by that point.

like image 195
caf Avatar answered Sep 28 '22 03:09

caf


What you missed during step 2 is that the stack gets switched from a thread's user-level stack (where you pushed args) to a thread's protected-level stack. The current context of the thread interrupted by the syscall is actually saved on this protected stack. Inside the ISR and just before entering the kernel, this protected-stack is again switched to the kernel stack you are talking about. Once inside the kernel, kernel functions such as scheduler's functions eventually use the kernel-stack. Later on, a thread gets elected by the scheduler and the system returns to the ISR, it switchs back from the kernel stack to the newly elected (or the former if no higher priority thread is active) thread's protected-level stack, wich eventually contains the new thread context. Therefore the context is restored from this stack by code automatically (depending on the underlying architecture). Finally, a special instruction restores the latest touchy resgisters such as the stack pointer and the instruction pointer. Back in the userland...

To sum-up, a thread has (generally) two stacks, and the kernel itself has one. The kernel stack gets wiped at the end of each kernel entering. It's interesting to point out that since 2.6, the kernel itself gets threaded for some processing, therefore a kernel-thread has its own protected-level stack beside the general kernel-stack.

Some ressources:

  • 3.3.3 Performing the Process Switch of Understanding the Linux Kernel, O'Reilly
  • 5.12.1 Exception- or Interrupt-Handler Procedures of the Intel's manual 3A (sysprogramming). Chapter number may vary from edition to other, thus a lookup on "Stack Usage on Transfers to Interrupt and Exception-Handling Routines" should get you to the good one.

Hope this help!

like image 41
Benoit Avatar answered Sep 28 '22 04:09

Benoit