In linux/arch/x86/include/asm/switch_to.h
, there's the definition of the macro switch_to
, the key lines which do the real thread switch miracle read like this (until Linux 4.7 when it changed):
asm volatile("pushfl\n\t" /* save flags */ \
pushl %%ebp\n\t" /* save EBP */ \
"movl %%esp,%[prev_sp]\n\t" /* save ESP */ \
"movl %[next_sp],%%esp\n\t" /* restore ESP */ \
"movl $1f,%[prev_ip]\n\t" /* save EIP */ \
"pushl %[next_ip]\n\t" /* restore EIP */ \
__switch_canary \
"jmp __switch_to\n" /* regparm call */ \
"1:\t" \
"popl %%ebp\n\t" /* restore EBP */ \
"popfl\n" /* restore flags */ \
The named operands have memory constraints like [prev_sp] "=m" (prev->thread.sp)
. __switch_canary
is defined to nothing unless CONFIG_CC_STACKPROTECTOR
is defined (then it's a load and store using %ebx
).
I understand how it works, like the kernel stack pointer backup/restore, and how the push next->eip
and jmp __switch_to
with a ret
instruction at the end of the function, which is actually a "fake" call instruction matched with a real ret
instruction, and effectively make the next->eip
the return point of the next thread.
What I don't understand is, why the hack? Why not just call __switch_to
, then after it ret
, jmp
to next->eip
, which is more clean and reader-friendly.
There's two reasons for doing it this way.
One is to allow complete flexibility of operand/register allocation for [next_ip]
. If you want to be able to do the jmp %[next_ip]
after the call __switch_to
then it is necessary to have %[next_ip]
allocated to a nonvolatile register (i.e. one that, by the ABI definitions, will retain its value when making a function call).
That introduces a restriction in the compiler's ability to optimize, and the resulting code for context_switch()
(the 'caller' - where switch_to()
is used) might not be as good as could be. But for what benefit ?
Well - that's where the second reason comes in, none, really, because call __switch_to
would be equivalent to:
pushl 1f
jmp __switch_to
1: jmp %[next_ip]
i.e. it pushes the return address; you'd end up with a sequence push
/jmp
(== call
)/ret
/jmp
while if you do not want to return to this place (and this code doesn't), you save on code branches by "faking" a call because you'd only have to do push
/jmp
/ret
. The code makes itself tail recursive here.
Yes, it's a small optimization, but avoiding a branch reduces latency and latency is critical for context switches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With