How do system calls work?

Tags:

I understand that a user can own a process and each process has an address space (which contains valid memory locations, this process can reference). I know that a process can call a system call and pass parameters to it, just like any other library function. This seems to suggest that all system calls are in a process address space by sharing memory, etc. But perhaps, this is only an illusion created by the fact that in high level programming language, system calls look like any other function, when a process calls it.

But, now let me take a step deeper and analyze more closely on what happens under the hood. How does compiler compile a system call? It perhaps pushes the system call name and parameters supplied by the process in a stack and then put the assembly instruction say "TRAP" or something -- basically the assembly instruction to call a software interrupt.

This TRAP assembly instruction is executed by hardware by first toggling the mode bit from user to kernel and then setting the code pointer to say beginning of interrupt service routines. From this point on, the ISR executes in kernel mode, which picks up the parameters from the stack (this is possible, because kernel has access to any memory location, even the ones owned by user processes) and executes the system call and in the end relinquishes the CPU, which again toggles the mode bit and the user process starts from where it left off.

Is my understanding correct?

Attached is rough diagram of my understanding: enter image description here

221

asked Jun 05 '11 07:06

xyz

4 Answers

Your understanding is pretty close; the trick is that most compilers will never write system calls, because the functions that programs call (e.g. getpid(2), chdir(2), etc.) are actually provided by the standard C library. The standard C library contains the code for the system call, whether it is called via INT 0x80 or SYSENTER. It'd be a strange program that makes system calls without a library doing the work. (Even though perl provides a syscall() function that can directly make system calls! Crazy, right?)

Next, the memory. The operating system kernel sometimes has easy address-space access to the user process memory. Of course, protection modes are different, and user-supplied data must be copied into the kernel's protected address space to prevent modification of user-supplied data while the system call is in flight:

static int do_getname(const char __user *filename, char *page) {     int retval;     unsigned long len = PATH_MAX;      if (!segment_eq(get_fs(), KERNEL_DS)) {         if ((unsigned long) filename >= TASK_SIZE)             return -EFAULT;         if (TASK_SIZE - (unsigned long) filename < PATH_MAX)             len = TASK_SIZE - (unsigned long) filename;     }      retval = strncpy_from_user(page, filename, len);     if (retval > 0) {         if (retval < len)             return 0;         return -ENAMETOOLONG;     } else if (!retval)         retval = -ENOENT;     return retval; }

This, while it isn't a system call itself, is a helper function called by system call functions that copies filenames into the kernel's address space. It checks to make sure that the entire filename resides within the user's data range, calls a function that copies the string in from user space, and performs some sanity checks before the returning.

get_fs() and similar functions are remnants from Linux's x86-roots. The functions have working implementations for all architectures, but the names remain archaic.

All the extra work with segments is because the kernel and userspace might share some portion of the available address space. On a 32-bit platform (where the numbers are easy to comprehend), the kernel will typically have one gigabyte of virtual address space, and user processes will typically have three gigabytes of virtual address space.

When a process calls into the kernel, the kernel will 'fix up' the page table permissions to allow it access to the whole range, and gets the benefit of pre-filled TLB entries for user-provided memory. Great success. But when the kernel must context switch back to userspace, it has to flush the TLB to remove the cached privileges on kernel address space pages.

But the trick is, one gigabyte of virtual address space is not sufficient for all kernel data structures on huge machines. Maintaining the metadata of cached filesystems and block device drivers, networking stacks, and the memory mappings for all the processes on the system, can take a huge amount of data.

So different 'splits' are available: two gigs for user, two gigs for kernel, one gig for user, three gigs for kernel, etc. As the space for the kernel goes up, the space for user processes goes down. So there is a 4:4 memory split that gives four gigabytes to the user process, four gigabytes to the kernel, and the kernel must fiddle with segment descriptors to be able to access user memory. The TLB is flushed entering and exiting system calls, which is a pretty significant speed penalty. But it lets the kernel maintain significantly larger data structures.

The much larger page tables and address ranges of 64 bit platforms probably makes all the preceding look quaint. I sure hope so, anyway.

answered Oct 05 '22 06:10

sarnold

Yes, you've got it pretty much right. One detail though, when the compiler compiles a system call, it will use the number of the system call rather than the name. For example, here is a list of Linux syscalls (for an old version, but the concept is still the same).

answered Oct 05 '22 07:10

Greg Hewgill

You actually call the C runtime library. It's not the compiler who inserts TRAP, it's the C library who wraps TRAP into a library call. The rest of your understanding is correct.

answered Oct 05 '22 07:10

ninjalj

If you wanted to perform a system call directly from your program, you could easily do so. It is platform dependent, but let's say you wanted to read from a file. Every system call has a number. In this case you place the number of the read_from_file system call in register EAX. The arguments for the system call is placed in different registers or the stack (depending on system call). After the registers are filled with the correct data and you are ready to perform the system call, you execute the instruction INT 0x80 (depends on architecture). That instruction is an interrupt which causes the control to go to the OS. The OS then identifies the system call number in the register EAX, acts accordingly and gives control back to the process doing the system call.

The way system calls are used are prone to change and depends on the given platform. By using libraries that provides easy interfaces to these system calls, you make your programs more platform independent and your code will be much more readable and faster to write. Consider implementing system calls directly in a high level language. You would need something like inline assembly to ensure data are put in the right registers.

answered Oct 05 '22 06:10

Kent Munthe Caspersen

Related questions
                            
                                Can a compiler automatically detect pure functions without the type information about purity?
                            
                                How-to ensure that compiler optimizations don't introduce a security risk?
                            
                                Does an R compiler to C/C++ exist?
                            
                                Why are short null values converted to int null values for comparing with null?
                            
                                Why is clang not used more? [closed]
                            
                                JIT compiler vs offline compilers
                            
                                assign operator to variable in python?
                            
                                Starting off a simple (the simplest perhaps) C compiler?
                            
                                Difference between Left Factoring and Left Recursion
                            
                                What is the purpose of null?
                            
                                C# variance annotation of a type parameter, constrained to be value type
                            
                                Can C# 'is' operator suffer under release mode optimization on .NET 4?
                            
                                Why do we need prefix, postfix notation
                            
                                _iterator_debug_level value '0' doesn't match value '2'
                            
                                Was C# compiler written in C++?
                            
                                Compiler - front end back end
                            
                                What are the differences between a Just-in-Time-Compiler and an Interpreter?
                            
                                What is the difference between make and gcc?
                            
                                Why are compilers so stupid?
                            
                                Why HTML/JavaScript/CSS are not compiled languages and will they ever be?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do system calls work?

Tags:

operating-system

process

compiler-construction

interrupt

system-calls

xyz

People also ask