I quite often listen to driver developers saying its good to avoid kernel mode switches as much as possible. I couldn't understand the precise reason. To start with my understanding is -
Among these operations syscall pretty much works like a normal function call. Though the sysenter could behave like a mis-predicted branch which could lead to ROB flush in processor pipeline. Even that is not really bad, its just like any other mis-predicted branch.
I heard a few people answering on Stack Overflow:
while(1);
doesnt guarantee a no-context switch.Where is the actual syscall cost coming from?
A system call differs from a user function in several key ways. A system call has more privilege than a normal subroutine. A system call runs with kernel-mode privilege in the kernel protection domain. System call code and data are located in global kernel memory.
In library call, the mode is executed in user mode only. 4. In system call the execution process speed is slower than the library call because there is a mode of transition called context switching. In library call the execution process speed is faster than the system call because there is no mode of context switching.
Slow system calls are those that wait for an indefinite stretch of time for something to finish (e.g. waitpid), for something to become available (e.g. read from a client socket that's not seen any data recently), or for some external event (e.g. network connection request from client via accept.)
The interrupt causes the kernel to take over and perform the requested action, then hands control back to the application. This mode switching is the reason that system calls are slower to execute than an equivalent application-level function.
You don't indicate what OS you are asking about. Let me attempt an answer anyway.
The CPU instructions syscall
and sysenter
should not be confused with the concept of a system call and its representation in the respective OSs.
The best explanation for the difference in the overhead incurred by each respective instruction is given by reading through the Operation sections of the Intel® 64 and IA-32 Architectures Developer's Manual volume 2A (for int
, see page 3-392) and volume 2B (for sysenter
see page 4-463). Also don't forget to glance at iretd
and sysexit
while at it.
A casual counting of the pseudo-code for the operations yields:
int
sysenter
Note: Although the existing answer is right in that sysenter
and syscall
are not interrupts or in any way related to interrupts, older kernels in the Linux and the Windows world used interrupts to implement their system call mechanism. On Linux this used to be int 0x80
and on Windows int 0x2E
. And consequently on those kernel versions the IDT had to be primed to provide an interrupt handler for the respective interrupt. On newer systems, that's true, the sysenter
and syscall
instructions have completely replaced the old ways. With sysenter
it's the MSR (machine specific register) 0x176
which gets primed with the address of the handler for sysenter
(see the reading material linked below).
A system call on Windows, just like on Linux, results in the switch to kernel mode. The scheduler of NT doesn't provide any guarantees about the time a thread is granted. Also it yanks away time from threads and can even end up starving threads. In general one can say that user mode code can be preempted by kernel mode code (with very few very specific exceptions to which you'll certainly get in the "advanced driver writing class"). This makes perfect sense if we only look at one example. User mode code can be swapped out - or, for that matter, the data it's trying to access. Now the CPU doesn't have the slightest clue how to access pages in the swap/paging file, so an intermediate step is required. And that's also why kernel mode code must be able to preempt user mode code. It is also the reason for one of the most prolific bug-check codes seen on Windows and mostly caused by third-party drivers: IRQL_NOT_LESS_OR_EQUAL
. It means that a driver accessed paged memory when it wasn't possible to preempt the code touching that memory.
KiFastSystemCall
SYSENTER
/SYSCALL
is not a software interrupt; whole point of those instructions is to avoid overhead caused by issuing IRQ and calling interrupt handler.
Saving registers on stack costs time, this is one place where the syscall cost comes from.
Another place comes from the kernel mode switch itself. It involves changing segment registers - CS, DS, ES, FS, GS, they all have to be changed (it's less costly on x86-64, as segmentation is mostly unused, but you still need to essentially make far jump to kernel code) and also changes CPU ring of execution.
To conclude: function call is (on modern systems, where segmentation is not used) near call, while syscall involves far call and ring switch.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With