In eglibc's <code>nptl/sysdeps/unix/sysv/linux/i386/fork.c</code> there's a definition: <pre class="prettyprint"><code>#define ARCH_FORK() \ INLINE_SYSCALL (clone, 5, \ CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0, \ NULL, NULL, &THREAD_SELF->tid) </code></pre> which is used in actual <code>__libc_fork()</code> as the heart of the implementation. But e.g. in Linux's <code>arch/x86/entry/syscalls/syscall_32.tbl</code> exists a <code>sys_fork</code> entry, as well as in <code>syscalls_64.tbl</code>. So apparently Linux does have its special syscall for <code>fork</code>. So I now wonder: why does glibc implement <code>fork()</code> in terms of <code>clone</code>, if the kernel already provides the <code>fork</code> syscall?

I looked at the commit where Ulrich Drepper added that code to glibc, and there wasn't any explanation in the commit log (or elsewhere). Have a look at Linux's implementation of <code>fork</code>, though: <pre class="prettyprint"><code>return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0); </code></pre> And here is <code>clone</code>: <pre class="prettyprint"><code>return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls); </code></pre> Obviously, they are almost exactly the same. The only difference is that when calling <code>clone</code>, you can set various flags, can specify a stack size for the new process, etc. <code>fork</code> doesn't take any arguments. Looking at Drepper's code, the <code>clone</code> flags are <code>CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD</code>. If <code>fork</code> was used, the only flag would be <code>SIGCHLD</code>. Here is what the <code>clone</code> manpage says about those extra flags: <pre class="prettyprint"><code>CLONE_CHILD_CLEARTID (since Linux 2.5.49) Erase child thread ID at location ctid in child memory when the child exits, and do a wakeup on the futex at that address. The address involved may be changed by the set_tid_address(2) system call. This is used by threading libraries. CLONE_CHILD_SETTID (since Linux 2.5.49) Store child thread ID at location ctid in child memory. </code></pre> ...And you can see that he does pass a pointer to where the kernel should first store the child's thread ID and then later do a futex wakeup. Is glibc doing a futex wait on that address somewhere? I don't know. If so, that would explain why Drepper chose to use <code>clone</code>. (And if not, it would be just one more example of the extreme accumulation of cruft which is our beloved glibc! If you wanted to find some nice, clean, well-maintained code, just keep moving and go have a look at musl libc!)

Why is sys_fork not used by glibc's implementation of fork?

Tags:

c

linux

fork

glibc

In eglibc's nptl/sysdeps/unix/sysv/linux/i386/fork.c there's a definition:

#define ARCH_FORK() \
  INLINE_SYSCALL (clone, 5,                           \
          CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD, 0,     \
          NULL, NULL, &THREAD_SELF->tid)

which is used in actual __libc_fork() as the heart of the implementation. But e.g. in Linux's arch/x86/entry/syscalls/syscall_32.tbl exists a sys_fork entry, as well as in syscalls_64.tbl. So apparently Linux does have its special syscall for fork.

So I now wonder: why does glibc implement fork() in terms of clone, if the kernel already provides the fork syscall?

591

asked May 11 '16 14:05

Ruslan

2 Answers

I looked at the commit where Ulrich Drepper added that code to glibc, and there wasn't any explanation in the commit log (or elsewhere).

Have a look at Linux's implementation of fork, though:

return _do_fork(SIGCHLD, 0, 0, NULL, NULL, 0);

And here is clone:

return _do_fork(clone_flags, newsp, 0, parent_tidptr, child_tidptr, tls);

Obviously, they are almost exactly the same. The only difference is that when calling clone, you can set various flags, can specify a stack size for the new process, etc. fork doesn't take any arguments.

Looking at Drepper's code, the clone flags are CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | SIGCHLD. If fork was used, the only flag would be SIGCHLD.

Here is what the clone manpage says about those extra flags:

CLONE_CHILD_CLEARTID (since Linux 2.5.49)
          Erase child thread ID at location ctid in child memory when  the  child
          exits,  and  do  a  wakeup  on  the futex at that address.  The address
          involved may be changed by the set_tid_address(2) system call.  This is
          used by threading libraries.

CLONE_CHILD_SETTID (since Linux 2.5.49)
          Store child thread ID at location ctid in child memory.

...And you can see that he does pass a pointer to where the kernel should first store the child's thread ID and then later do a futex wakeup. Is glibc doing a futex wait on that address somewhere? I don't know. If so, that would explain why Drepper chose to use clone.

(And if not, it would be just one more example of the extreme accumulation of cruft which is our beloved glibc! If you wanted to find some nice, clean, well-maintained code, just keep moving and go have a look at musl libc!)

173

answered Oct 15 '22 02:10

Alex D

In a nutshell: why not?

You have one syscall that is guaranteed to exist on all platforms (you do realize that Intel isn't the only platform out there, right?), and another that is deprecated because it is unnecessary. They both carry the exact same semantics. Your code is much more compact when you only call the one guaranteed to exist.

I will elaborate on that a little.

Fork is defined by Posix, while clone is Linux specific. However, Linux, on occasion, takes Posix defined "system calls" and implements them in user space. Such is the case for fork (and vfork and pthread_create). They are all implemented in user space by calling "clone".

As such, fork is deemed unnecessary at the kernel level. If a thin user space wrapper can implement it, the kernel is okay with that. As such, on Linux, clone is guaranteed to exist on all platforms, while fork may or may not exist, depending on specific platform.

answered Oct 15 '22 02:10

Shachar Shemesh

Related questions
                            
                                GCC auto-vectorization has no effect on runtime, even when supposedly "profitable"
                            
                                Reading and writing structures in C
                            
                                How can I set the number of OpenMP threads from within the program?
                            
                                How can I convert to size_t from int safely?
                            
                                how to initialize a const array at specific address in memory?
                            
                                printf format for 1 byte signed number
                            
                                How should "this can never happen" style of errors in C be handled? [closed]
                            
                                D-Bus how to create and send a Dict?
                            
                                gcc optimization skips initializing allocated memory
                            
                                If getaddrinfo fails once, it fails forever (even after network is ready)
                            
                                Use of MPI_COMM_SELF
                            
                                Is this a valid definition for main()
                            
                                What's the difference between GtkApplication and gtk_init?
                            
                                Get the environment variable address
                            
                                Typedef an uintX_t type, where X is the value of a macro
                            
                                Where is the code for default signal handler in ELF binary?
                            
                                Tracing code execution in embedded Python interpreter
                            
                                Very odd code under certain conditions including optimizations
                            
                                mmap: Cannot allocate memory
                            
                                Are function calls like read() , write() actual system calls in linux?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With