SYSCALL
and SYSRET
(and their 32-bit-only Intel counterparts SYSENTER
and SYSEXIT
) are usually described as a “generally faster” way to enter and exit supervisor mode in x86 processors than call gates or software interrupts, but the exact figures underlying this claim remain largely undocumented. In particular, all of the Intel or AMD optimization guides I was able to find contain no mention of these instructions at all. So:
SYSCALL
and SYSRET
take across recent Intel 64 microarchitectures? This is probably measurable by direct experimentation, but there are quite a few of different CPUs to test.Depending on the order of magnitude of this number, more detailed questions may be relevant:
Assume 64-bit code on the userspace side, no additional address-space switches (writes to CR3) and even matching SYSCALL
and SYSRET
pairs if it matters.
The syscall instruction transfers control to the operating system which then performs the requested service. Then control (usually) returns to the program. (This description leaves out many details). The syscall instruction causes an exception, which transfers control to an exception handler.
syscall is an instruction in x86-64, and is used as part of the ABI for making system calls. (The 32-bit ABI uses int 80h or sysenter , and is also available in 64-bit mode, but using the 32-bit ABI from 64-bit code is a bad idea, especially for calls with pointer arguments.)
SYSENTER is a companion instruction to SYSEXIT. The instruction is optimized to provide the maximum performance for system calls from user code running at privilege level 3 to operating system or executive procedures running at privilege level 0.
I was curious too so I've written some basic bare-metal code to benchmark it: just a loop that calls syscall 1000000 times in a loop, with the syscall handler just running sysret and nothing else. On my Ryzen 7 3700X it averages 78 cycles for the call+return.
Obviously that's an artificial benchmark, because a real system call handler will likely need to do some things like switch stacks and perform Spectre mitigations. But it gives an idea of the order-of-magnitude, which is less than a cache miss.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With