I am experimenting by statically compiling a minimal program and examining the system calls that are issued:
$ cat hello.c
#include <stdio.h>
int main (void) {
write(1, "Hello world!", 12);
return 0;
}
$ gcc hello.c -static
$ objdump -f a.out
a.out: file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004003c0
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 39 vars */]) = 0
uname({sys="Linux", node="ubuntu", ...}) = 0
brk(0) = 0xa20000
brk(0xa211a0) = 0xa211a0
arch_prctl(ARCH_SET_FS, 0xa20880) = 0
brk(0xa421a0) = 0xa421a0
brk(0xa43000) = 0xa43000
write(1, "Hello world!", 12Hello world!) = 12
exit_group(0) = ?
I know that when linked non-statically, ld
emits startup code to map libc.so
and ld.so
into the process's address space, and ld.so
would continue loading any other shared libraries.
But in this case, why are so many system calls issued, apart from execve
, write
and exit_group
?
Why the heck uname(2)
? Why so many calls to brk(2)
to get and set the program break, and a call to arch_prctl(2)
to set the process state, when that seems like something that should have been done in kernel-space, at execve
time?
uname
is needed to check that the kernel version is not too ancient.
Two brk
s are needed to set up thread local storage. Two others are needed to set up dynamic loader path (the executable still might call dlopen
, even if it's statically linked). I'm not sure why these come in pairs.
On system arch_prctl
isn't called, set_thread_area
is called in its place. This sets up TLS for the current thread.
These things probably could be done lazily (i.e. called when corresponding facilities are used for the first time). But perhaps it would make no sense performance-wise (just a guess).
By the way gdb-7.x
can stop on system calls with the catch syscall
command.
Shameless plug: When built against musl libc, the strace for that program static linked or dynamic linked is:
execve("./a.out", ["./a.out"], [/* 42 vars */]) = 0
write(1, "Hello world!", 12) = 12
exit_group(0) = ?
It should be similarly minimal with dietlibc if you static link, or with uClibc and static linking as long as you built uClibc with locale and advanced stdio stuff disabled. (For some reason uClibc with those features enabled runs lots of startup code to initialize them even in programs that don't use them...). As far as I know, however, musl is the only one that has a dynamic linker capable of avoiding heavy startup syscall overhead in dynamic-linked programs.
As for why static linking with glibc makes all those brk
calls, I really have no idea; you'd have to read the source. I suspect it's allocating space for internal data structures for malloc
, stdio, locale, and possibly the thread structure for the main thread. As n.m. said, the arch_prctl
is for setting the thread register to point to the main thread's thread structure. This could be deferred to the first access (which musl does), but it's a bit of a pain to do so and mildly hurts performance. If you care about the runtime of large programs more than the startup time of many many small programs, it may make sense to always initialize the thread register at program load time. Note that the kernel cannot set it for you because it does not know the address it should be set to.
It's possible that an extension to the ELF format could be made to allow the main thread structure to be in the .data
section with an ELF header telling the kernel where it is, but the acrobatics needed between the libc, the linker, and the kernel would probably be so ugly as to make this optimization undesirable... They would also impose further constraints on the userspace implementation of threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With