In linux/arch/x86/include/asm/switch_to.h, there's the definition of the macro switch_to, the key lines which do the real thread switch miracle read like this (until Linux 4.7 when it changed):
asm volatile("pushfl\n\t" /* save flags */ \
pushl %%ebp\n\t" /* save EBP */ \
"movl %%esp,%[prev_sp]\n\t" /* save ESP */ \
"movl %[next_sp],%%esp\n\t" /* restore ESP */ \
"movl $1f,%[prev_ip]\n\t" /* save EIP */ \
"pushl %[next_ip]\n\t" /* restore EIP */ \
__switch_canary \
"jmp __switch_to\n" /* regparm call */ \
"1:\t" \
"popl %%ebp\n\t" /* restore EBP */ \
"popfl\n" /* restore flags */ \
The named operands have memory constraints like [prev_sp] "=m" (prev->thread.sp). __switch_canary is defined to nothing unless CONFIG_CC_STACKPROTECTOR is defined (then it's a load and store using %ebx).
I understand how it works, like the kernel stack pointer backup/restore, and how the push next->eip and jmp __switch_to with a ret instruction at the end of the function, which is actually a "fake" call instruction matched with a real ret instruction, and effectively make the next->eip the return point of the next thread.
What I don't understand is, why the hack? Why not just call __switch_to, then after it ret, jmp to next->eip, which is more clean and reader-friendly.
There's two reasons for doing it this way.
One is to allow complete flexibility of operand/register allocation for [next_ip]. If you want to be able to do the jmp %[next_ip] after the call __switch_to then it is necessary to have %[next_ip] allocated to a nonvolatile register (i.e. one that, by the ABI definitions, will retain its value when making a function call).
That introduces a restriction in the compiler's ability to optimize, and the resulting code for context_switch() (the 'caller' - where switch_to() is used) might not be as good as could be. But for what benefit ?
Well - that's where the second reason comes in, none, really, because call __switch_to would be equivalent to:
pushl 1f
jmp __switch_to
1: jmp %[next_ip]
i.e. it pushes the return address; you'd end up with a sequence push/jmp (== call)/ret/jmp while if you do not want to return to this place (and this code doesn't), you save on code branches by "faking" a call because you'd only have to do push/jmp/ret. The code makes itself tail recursive here.
Yes, it's a small optimization, but avoiding a branch reduces latency and latency is critical for context switches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With