Why does switch_to use push+jmp+ret to change EIP, instead of jmp directly?

Question

In linux/arch/x86/include/asm/switch_to.h, there's the definition of the macro switch_to, the key lines which do the real thread switch miracle read like this (until Linux 4.7 when it changed):

asm volatile("pushfl
	"       /* save    flags */ \
              pushl %%ebp
	"      /* save    EBP   */ \
              "movl %%esp,%[prev_sp]
	"   /* save    ESP   */ \
              "movl %[next_sp],%%esp
	"   /* restore ESP   */ \
              "movl $1f,%[prev_ip]
	" /* save    EIP   */ \
              "pushl %[next_ip]
	"    /* restore EIP   */ \
              __switch_canary                   \
              "jmp __switch_to
"   /* regparm call  */ \
              "1:	"                        \
              "popl %%ebp
	"      /* restore EBP   */ \
              "popfl
"         /* restore flags */ \

The named operands have memory constraints like [prev_sp] "=m" (prev->thread.sp). __switch_canary is defined to nothing unless CONFIG_CC_STACKPROTECTOR is defined (then it's a load and store using %ebx).

I understand how it works, like the kernel stack pointer backup/restore, and how the push next->eip and jmp __switch_to with a ret instruction at the end of the function, which is actually a "fake" call instruction matched with a real ret instruction, and effectively make the next->eip the return point of the next thread.

What I don't understand is, why the hack? Why not just call __switch_to, then after it ret, jmp to next->eip, which is more clean and reader-friendly.

FrankH. · Accepted Answer

There's two reasons for doing it this way.

One is to allow complete flexibility of operand/register allocation for [next_ip]. If you want to be able to do the jmp %[next_ip] after the call __switch_to then it is necessary to have %[next_ip] allocated to a nonvolatile register (i.e. one that, by the ABI definitions, will retain its value when making a function call).

That introduces a restriction in the compiler's ability to optimize, and the resulting code for context_switch() (the 'caller' - where switch_to() is used) might not be as good as could be. But for what benefit ?

Well - that's where the second reason comes in, none, really, because call __switch_to would be equivalent to:

pushl 1f
jmp __switch_to
1: jmp %[next_ip]

i.e. it pushes the return address; you'd end up with a sequence push/jmp (== call)/ret/jmp while if you do not want to return to this place (and this code doesn't), you save on code branches by "faking" a call because you'd only have to do push/jmp/ret. The code makes itself tail recursive here.

Yes, it's a small optimization, but avoiding a branch reduces latency and latency is critical for context switches.

Why does switch_to use push+jmp+ret to change EIP, instead of jmp directly?

Tags:

x86

assembly

linux-kernel

ed9er

1 Answers

FrankH.

Recent Activity

Donate For Us

Why does switch_to use push+jmp+ret to change EIP, instead of jmp directly?

Tags:

x86

assembly

linux-kernel

ed9er

1 Answers

FrankH.

Related questions

Recent Activity

Donate For Us