Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does a syscall actually happen on linux?

Inspired by this question

How can I force GDB to disassemble?

and related to this one

What is INT 21h?

How does an actually system call happen under linux? what happens when the call is performed, until the actual kernel routine is invoked ?

like image 809
Stefano Borini Avatar asked Aug 07 '09 16:08

Stefano Borini


2 Answers

Assuming we're talking about x86:

  1. The ID of the system call is deposited into the EAX register
  2. Any arguments required by the system call are deposited into the locations dictated by the system call. For example, some system calls expect their argument to reside in the EBX register. Others may expect their argument to be sitting on the top of the stack.
  3. An INT 0x80 interrupt is invoked.
  4. The Linux kernel services the system call identified by the ID in the EAX register, depositing any results in pre-determined locations.
  5. The calling code makes use of any results.

I may be a bit rusty at this, it's been a few years...

like image 131
Adam Paynter Avatar answered Sep 27 '22 21:09

Adam Paynter


The given answers are correct but I would like to add that there are more mechanisms to enter kernel mode. Every recent kernel maps the "vsyscall" page in every process' address space. It contains little more than the most efficient syscall trap method.

For example on a regular 32 bit system it could contain:

 
0xffffe000: int $0x80
0xffffe002: ret

But on my 64-bitsystem I have access to the way more efficient method using the syscall/sysenter instructions


0xffffe000: push   %ecx
0xffffe001: push   %edx
0xffffe002: push   %ebp
0xffffe003:     mov    %esp,%ebp
0xffffe005:     sysenter 
0xffffe007: nop    
0xffffe008: nop    
0xffffe009: nop    
0xffffe00a: nop    
0xffffe00b: nop    
0xffffe00c: nop    
0xffffe00d: nop    
0xffffe00e:     jmp    0xffffe003
0xffffe010: pop    %ebp
0xffffe011: pop    %edx
0xffffe012: pop    %ecx
0xffffe013: ret    

This vsyscall page also maps some systemcalls that can be done without a context switch. I know certain gettimeofday, time and getcpu are mapped there, but I imagine getpid could fit in there just as well.

like image 33
Kasper Avatar answered Sep 27 '22 20:09

Kasper