I'm trying to understand how the schedule process in linux kernel actually works. My question is not about the scheduling algorithm. Its about how the functions schedule()
and switch_to()
work.
I'll try to explain. I saw that:
When a process runs out of time-slice, the flag need_resched
is set by scheduler_tick()
. The kernel checks the flag, sees that it is set, and calls schedule()
(pertinent to question 1) to switch to a new process. This flag is a message that schedule should be invoked as soon as possible because another process deserves to run. Upon returning to user-space or returning from an interrupt, the need_resched
flag is checked. If it is set, the kernel invokes the scheduler before continuing.
Looking into the kernel source (linux-2.6.10 - version that the book "Linux Kernel Development, second edition" is based on), I also saw that some codes can call the schedule()
function voluntarily, giving another process the right to run. I saw that the function switch_to()
is the one that actually does the context switch. I looked into some architecture dependent codes, trying to understand what switch_to()
was actually doing.
That behavior raised some questions that I could not find the answers for :
When switch_to()
finishes, what is the current running process? The process that called schedule()
? Or the next process, the one that was picked to run?
When schedule()
gets called by an interrupt, the selected process to run starts to run when the interrupt handling finishes (after some kind of RTE) ? Or before that?
If the schedule()
function can not be called from an interrupt, when is the flag- need_resched
set?
When the timer interrupt handler is working, what stack is being used?
I don't know if I could make myself clear. If I couldn't, I hope I can do this after some answers (or questions). I already looked at several sources trying to understand that process. I have the book "Linux Kernel Development, sec ed", and I'm using it too. I know a bit about MIPs and H8300 architecture, if that help to explain.
The scheduler will be invoked if the current thread/process is going to sleep/wait for some event/resource to be released. In one of the cases of worker threads which executes the bottom half in the form of workqueues, it will run in a while loop and check if the workqueue list is empty.
The part of the kernel, which is responsible for granting CPU time to tasks, is called process scheduler. A scheduler chooses the next task to be run, and maintains the order, which all the processes on the system should be run in, as well.
The Scheduler Entry Point The main entry point into the process scheduler is the function __schedule. This is the function that the rest of the kernel uses to invoke the process scheduler, deciding which process to run and then running it. Its main goal is to find the next task to be run.
Scheduling classes. One may say that the Linux kernel scheduler consists of mainly two different scheduling algorithms, which are what are called the real-time scheduler and the completely fair scheduler. Scheduling classes allow for implementing these algorithms in a modular way.
switch_to()
, the kernel stack is switched to that of the task named in next
. Changing the address space, etc, is handled in eg context_switch()
. schedule()
cannot be called in atomic context, including from an interrupt (see the check in schedule_debug()
). If a reschedule is needed, the TIF_NEED_RESCHED task flag is set, which is checked in the interrupt return path.To be a bit more detailed, here's a practical example:
schedule()
.schedule()
if needed) as well as checking for pending signals, then goes back for another round at retint_check
until there's no more important flags set.As for switch_to()
; what switch_to()
(on x86-32) does is:
current_task
. At this point, current
now points to the new task.switch_to()
current
points to the new task, and we're on the new task's stack, but various other CPU state hasn't been updated. __switch_to()
handles switching the state of things like the FPU, segment descriptors, debug registers, etc.__switch_to()
, the return address that switch_to()
manually pushed onto the stack is returned to, placing execution back where it was prior to the switch_to()
in the new task. Execution has now fully resumed on the switched-to task.x86-64 is very similar, but has to do slightly more saving/restoration of state due to the different ABI.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With