Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do SYSCALL/SYSRET instructions perform across x86 CPUs?

SYSCALL and SYSRET (and their 32-bit-only Intel counterparts SYSENTER and SYSEXIT) are usually described as a “generally faster” way to enter and exit supervisor mode in x86 processors than call gates or software interrupts, but the exact figures underlying this claim remain largely undocumented. In particular, all of the Intel or AMD optimization guides I was able to find contain no mention of these instructions at all. So:

  • How many cycles (estimated) do SYSCALL and SYSRET take across recent Intel 64 microarchitectures? This is probably measurable by direct experimentation, but there are quite a few of different CPUs to test.

Depending on the order of magnitude of this number, more detailed questions may be relevant:

  • Do they incur a complete pipeline stall, or any other kind of stall?
  • How, if at all, do they interact with branch prediction (e.g. the return stack buffer) and fetch logic?
  • What about latencies, data dependencies, serialization?
  • &tc.

Assume 64-bit code on the userspace side, no additional address-space switches (writes to CR3) and even matching SYSCALL and SYSRET pairs if it matters.

like image 374
Alex Shpilkin Avatar asked Aug 07 '13 23:08

Alex Shpilkin


People also ask

How does the syscall instruction work?

The syscall instruction transfers control to the operating system which then performs the requested service. Then control (usually) returns to the program. (This description leaves out many details). The syscall instruction causes an exception, which transfers control to an exception handler.

What is a syscall x86?

syscall is an instruction in x86-64, and is used as part of the ABI for making system calls. (The 32-bit ABI uses int 80h or sysenter , and is also available in 64-bit mode, but using the 32-bit ABI from 64-bit code is a bad idea, especially for calls with pointer arguments.)

What is Sysenter?

SYSENTER is a companion instruction to SYSEXIT. The instruction is optimized to provide the maximum performance for system calls from user code running at privilege level 3 to operating system or executive procedures running at privilege level 0.


1 Answers

I was curious too so I've written some basic bare-metal code to benchmark it: just a loop that calls syscall 1000000 times in a loop, with the syscall handler just running sysret and nothing else. On my Ryzen 7 3700X it averages 78 cycles for the call+return.

Obviously that's an artificial benchmark, because a real system call handler will likely need to do some things like switch stacks and perform Spectre mitigations. But it gives an idea of the order-of-magnitude, which is less than a cache miss.

like image 184
Bruce Merry Avatar answered Sep 28 '22 13:09

Bruce Merry