I read some paragraphs in LKD1 and I just cannot understand the contents below:
Accessing the System Call from User-Space
Generally, the C library provides support for system calls. User applications can pull in function prototypes from the standard headers and link with the C library to use your system call (or the library routine that, in turn, uses your syscall call). If you just wrote the system call, however, it is doubtful that glibc already supports it!
Thankfully, Linux provides a set of macros for wrapping access to system calls. It sets up the register contents and issues the trap instructions. These macros are named
_syscalln()
, wheren
is between zero and six. The number corresponds to the number of parameters passed into the syscall because the macro needs to know how many parameters to expect and, consequently, push into registers. For example, consider the system callopen()
, defined aslong open(const char *filename, int flags, int mode)
The syscall macro to use this system call without explicit library support would be
#define __NR_open 5 _syscall3(long, open, const char *, filename, int, flags, int, mode)
Then, the application can simply call
open()
.For each macro, there are 2+2×n parameters. The first parameter corresponds to the return type of the syscall. The second is the name of the system call. Next follows the type and name for each parameter in order of the system call. The
__NR_open
define is in<asm/unistd.h>
; it is the system call number. The_syscall3
macro expands into a C function with inline assembly; the assembly performs the steps discussed in the previous section to push the system call number and parameters into the correct registers and issue the software interrupt to trap into the kernel. Placing this macro in an application is all that is required to use theopen()
system call.Let's write the macro to use our splendid new
foo()
system call and then write some test code to show off our efforts.#define __NR_foo 283 __syscall0(long, foo) int main () { long stack_size; stack_size = foo (); printf ("The kernel stack size is %ld\n", stack_size); return 0; }
What does the application can simply call open()
mean?
Besides, for the last piece of code, where is the declaration of foo()
? And how can I make this piece of code compilable and runnable? What are the header files I need to include?
__________
1Linux Kernel Development, by Robert Love.
PDF file at wordpress.com (go to page 81); Google Books result.
A system call is a way for programs to interact with the operating system. A computer program makes a system call when it makes a request to the operating system's kernel. System call provides the services of the operating system to the user programs via Application Program Interface(API).
System Call Interfaces (SCI) are the only way to transit from User space to kernel space. Kernel space switching is achieved by Software Interrupt, which changes the processor mode and jump the CPU execution into interrupt handler, which executes corresponding System Call routine.
A user-mode program can execute a TRAP instruction to perform a system call. From the program's point of view, they know that the operating system will perform the request, but have no idea how long it will take. An interrupt can arrive, raising the CPU's interrupt level from 0 to some number N.
When a user program invokes a system call, a system call instruction is executed, which causes the processor to begin executing the system call handler in the kernel protection domain. This system call handler performs the following actions: Sets the ut_error field in the uthread structure to 0.
At first I would like to provide some definition of system call. System call is a process of synchronous explicit requesting of the particular kernel service from the user space application. Synchronous mean that the act of system call is predetermined by executing instructions sequence. Interrupts is an example of asynchronous system service request, because they arrive to the kernel absolutely independently from the code executing on processor. Exceptions in the contrast to system calls are synchronous but implicit requests for the kernel services.
System call consist from four stages:
In general, all these actions can be implemented as a part of one big library function which makes a number of auxiliary actions before and/or after actual system call. In this case we can say that the system call is embedded in this function, but the function in general isn't a system call. In another case we can have a tiny function which makes only this four steps and nothing more. In this case we can say that this function is a system call. Actually you can implement the system call itself by manual implementation of all four stages mentioned above. Note, that in this case you will be forced to use Assembler, because all this steps are entirely architecture-dependent.
For example, Linux/i386 environment has next system call convention:
include\uapi\asm-generic\unistd.h
.In modern versions of Linux there is no any _syscall macro (as far I know). Instead, glibc library, that is a main interface library of the Linux kernel, provides a special macro - INTERNAL_SYSCALL
, which expands into a small piece of code populated by inline assembler instructions. This piece of code is targeted to a particular hardware platform and implements all stages of system call, and due to this, this macro represents a system call itself. There is also another macro - INLINE_SYSCALL
. The last one macro provides glibc-like error handling, in accordance to which on failed system call -1 will be returned and the error number will be stored in errno
variable. Both macros are defined in sysdep.h
of glibc package.
You can invoke a system call in the next way:
#include <sysdep.h>
#define __NR_<name> <id>
int my_syscall(void)
{
return INLINE_SYSCALL(<name>, <argc>, <argv>);
}
where <name>
must be replaced by the syscall name string, <id>
- by the wanted system service number id, <argc>
- by the actual number of parameters (from 0 to 6) and <argv>
- by actual parameters separated by commas (and started by comma if parameters are present).
For example:
#include <sysdep.h>
#define __NR_exit 1
int _exit(int status)
{
return INLINE_SYSCALL(exit, 1, status); // takes 1 parameter "status"
}
or another example:
#include <sysdep.h>
#define __NR_fork 2
int _fork(void)
{
return INLINE_SYSCALL(fork, 0); // takes no parameters
}
You first should understand what is the role of the linux kernel, and that applications interact with the kernel only thru system calls.
In effect, an application runs on the "virtual machine" provided by the kernel: it is running in the user space and can only do (at the lowest machine level) the set of machine instructions permitted in user CPU mode augmented by the instruction (e.g. SYSENTER
or INT 0x80
...) used to make system calls. So, from the user-level application point of view, a syscall is an atomic pseudo machine instruction.
The Linux Assembly Howto explains how a syscall can be done at the assembly (i.e. machine instruction) level.
The GNU libc is providing C functions corresponding to the syscalls. So for example the open function is a tiny glue (i.e. a wrapper) above the syscall of number NR__open
(it is making the syscall then updating errno
). Application usually call such C functions in libc instead of doing the syscall.
You could use some other libc
. For instance the MUSL libc is somhow "simpler" and its code is perhaps easier to read. It also is wrapping the raw syscalls into corresponding C functions.
If you add your own syscall, you better also implement a similar C function (in your own library). So you should have also a header file for your library.
See also intro(2) and syscall(2) and syscalls(2) man pages, and the role of VDSO in syscalls.
Notice that syscalls are not C functions. They don't use the call stack (they could even be invoked without any stack). A syscall is basically a number like NR__open
from <asm/unistd.h>
, a SYSENTER
machine instruction with conventions about which registers hold before the arguments to the syscall and which ones hold after the result[s] of the syscall (including the failure result, to set errno
in the C library wrapping the syscall). The conventions for syscalls are not the calling conventions for C functions in the ABI spec (e.g. x86-64 psABI). So you need a C wrapper.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With