Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in ABI between x86_64 Linux functions and syscalls

The x86_64 SysV ABI's function calling convention defines integer argument #4 to be passed in the rcx register. The Linux kernel syscall ABI, on the other hand, uses r10 for that same purpose. All other arguments are passed in the same registers for both functions and syscalls.

This leads to some strange things. Check out, for example, the implementation of mmap in glibc for the x32 platform (for which the same discrepancy exists):

00432ce0 <__mmap>:
  432ce0:       49 89 ca                mov    %rcx,%r10
  432ce3:       b8 09 00 00 40          mov    $0x40000009,%eax
  432ce8:       0f 05                   syscall

So all register are already in place, except we move rcx to r10.

I am wondering why not define the syscall ABI to be the same as the function call ABI, considering that they are already so similar.

like image 623
Shachar Shemesh Avatar asked Jul 25 '16 21:07

Shachar Shemesh


2 Answers

The syscall instruction is intended to provide a quicker method of entering Ring-0 in order to carry out a system call. This is meant to be an improvement over the old method, which was to raise a software interrupt (int 0x80 on Linux).

Part of the reason the instruction is faster is because it does not change memory, or even change rsp to point at a kernel stack. Unlike a software interrupt, where the CPU is forced to allow the OS to resume operation without clobbering anything, for this command the CPU is allowed to assume the software is aware that something is happening here.

In particular, syscall stores two parts of the user-space state in registers. The RIP to return to after the call is stored in rcx, and the flags are stored in R11 (because RFLAGS is masked with a kernel-supplied value before entry to the kernel). This means that both those registers are clobbered by the instruction.

Since they are clobbered, the syscall ABI uses another register instead of rcx, hence the use of r10 for the 4th argument.

r10 is a natural choice, since in the x86-64 SystemV ABI it's not used for passing function args, and functions don't need to preserve their caller's value of r10. So a syscall wrapper function can mov %rcx, %r10 without any save/restore. This wouldn't be possible with any other register, for 6-arg syscalls and the SysV ABI's function calling convention.


BTW, the 32-bit system call ABI is also accessible with sysenter, which requires cooperation between user-space and kernel-space to allow returning to user-space after a sysenter. (i.e. storing some state in user-space before running sysenter). This is higher performance than int 0x80, but awkward. Still, glibc uses it (by jumping to user-space code in the vdso pages that the kernel maps into the address space of every process).

AMD's syscall is another approach to the same idea as Intel's sysenter: to make entry/exit from the kernel less expensive by not preserving absolutely everything.

like image 183
Shachar Shemesh Avatar answered Oct 05 '22 19:10

Shachar Shemesh


AMD's syscall clobbers the rcx register, thus r10 is used instead.

like image 44
a3f Avatar answered Oct 05 '22 19:10

a3f