Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internals of a Linux system call

What happens (in detail) when a thread makes a system call by raising interrupt 80? What work does Linux do to the thread's stack and other state? What changes are done to the processor to put it into kernel mode? After running the interrupt handler, how is control restored back to the calling process?

What if the system call can't be completed quickly: e.g. a read from disk. How does the interrupt handler relinquish control so that the processor can do other stuff while data is being loaded and how does it then obtain control again?

like image 338
abc Avatar asked Feb 19 '10 21:02

abc


People also ask

How a system call works in Linux?

A system call is a function that allows a process to communicate with the Linux kernel. It's just a programmatic way for a computer program to order a facility from the operating system's kernel. System calls expose the operating system's resources to user programs through an API (Application Programming Interface).

Where are system calls stored in Linux?

Actual code for system_call entry point can be found in /usr/src/linux/kernel/sys_call. S Actual code for many of the system calls can be found in /usr/src/linux/kernel/sys. c, and the rest are found elsewhere.

How many system calls are there in Linux?

Many modern operating systems have hundreds of system calls. For example, Linux and OpenBSD each have over 300 different calls, NetBSD has close to 500, FreeBSD has over 500, Windows has close to 2000, divided between win32k (graphical) and ntdll (core) system calls while Plan 9 has 51.


2 Answers

A crash course in kernel mode in one stack overflow answer

Good questions! (Interview questions?)


  • What happens (in detail) when a thread makes a system call by raising interrupt 80?

The int $80 operation is vaguely like a function call. The CPU "takes a trap" and restarts at a known address in kernel mode, typically with a different MMU mode as well. The kernel will save many of the registers, though it doesn't have to save the registers that a program would not expect an ordinary function call to save.

  • What work does Linux do to the thread's stack and other state?

Typically an OS will save registers that the ABI promises not to change during procedure calls. The stack will stay the same; the kernel will run on a per-thread kernel stack rather than the per-thread user stack. Naturally some state will change, otherwise there would be no reason to do the system call.

  • What changes are done to the processor to put it into kernel mode?

This is usually entirely automatic. The CPU has, generically, a software-interrupt instruction that is a bit like a functional-call operation. It will cause the switch to kernel mode under controlled conditions. Typically, the CPU will change some sort of PSW protection bit, save the old PSW and PC, start at a well-known trap vector address, and may also switch to a different memory management protection and mapping arrangement.

  • After running the interrupt handler, how is control restored back to the calling process?

There will be some sort of "return from interrupt" or "return from trap" instruction, typically, that will act a bit like a complicated function-return instruction. Some RISC processors did very little automatically and required specific code to do the return and some CISC processors like x86 have (never-really-used) instructions that would execute dozens of operations documented in pages of architecture-manual pseudo-code for capability adjustments.

  • What if the system call can't be completed quickly: e.g. a read from disk. How does the interrupt handler relinquish control so that the processor can do other stuff while data is being loaded and how does it then obtain control again?

The kernel itself is threaded much like a threaded user program is. It just switches stacks (threads) and works on someone else's process for a while.

like image 78
DigitalRoss Avatar answered Sep 16 '22 16:09

DigitalRoss


To answer the last part of the question - what does the kernel do if the system call needs to sleep -

After a system call, the kernel is still logically running in the context of the same task that made the system call - it's just in kernel mode rather than user mode - it is NOT a separate thread and most system calls do not invoke logic from another task/thread. What happens is that the system call calls wait_event, or wait_event_timeout or some other wait function, which adds the task to a list of tasks waiting for something, then puts the task to sleep, which changes its state, and calls schedule() to relinquish the current CPU.

After this the task cannot be run again until it gets woken up, typically by another task (kernel task, etc) or interrupt handler calling a wake* function which will wake up the task(s) sleeping waiting for that particular event, which means the scheduler will soon schedule them again.

It's worth noting that userspace tasks (i.e. threads) are only one type of task and there are a few others internal to the kernel which can do work as well - these are kernel threads and bottom half handlers / tasklets / task queues etc. Work which doesn't belong to any particular userspace process (for example network handling e.g. responding to pings) gets done in these. These tasks are allowed to go to sleep, unlike interrupts (which should not invoke the scheduler)

like image 38
MarkR Avatar answered Sep 16 '22 16:09

MarkR