I was looking at the _do_fork()
function () trying to understand how fork()
returns the child PID for the parent process and 0 on the child process.
I think that nr
contains the PID of the child process (that will be returned to the caller process), but I can't see how it is able to return 0 to the child process.
The answer How does fork() know when to return 0? says that the return value is passed on the stack created for the new process, but (besides not really understanding it) I can't find that in the code.
So, where the return value of 0 is set for the child process?
The code of the _do_fork()
function is copied below:
long _do_fork(unsigned long clone_flags,
unsigned long stack_start,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr,
unsigned long tls)
{
struct task_struct *p;
int trace = 0;
long nr;
/*
* Determine whether and which event to report to ptracer. When
* called from kernel_thread or CLONE_UNTRACED is explicitly
* requested, no event is reported; otherwise, report if the event
* for the type of forking is enabled.
*/
if (!(clone_flags & CLONE_UNTRACED)) {
if (clone_flags & CLONE_VFORK)
trace = PTRACE_EVENT_VFORK;
else if ((clone_flags & CSIGNAL) != SIGCHLD)
trace = PTRACE_EVENT_CLONE;
else
trace = PTRACE_EVENT_FORK;
if (likely(!ptrace_event_enabled(current, trace)))
trace = 0;
}
p = copy_process(clone_flags, stack_start, stack_size,
child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
add_latent_entropy();
/*
* Do this prior waking up the new thread - the thread pointer
* might get invalid after that point, if the thread exits quickly.
*/
if (!IS_ERR(p)) {
struct completion vfork;
struct pid *pid;
trace_sched_process_fork(current, p);
pid = get_task_pid(p, PIDTYPE_PID);
nr = pid_vnr(pid);
if (clone_flags & CLONE_PARENT_SETTID)
put_user(nr, parent_tidptr);
if (clone_flags & CLONE_VFORK) {
p->vfork_done = &vfork;
init_completion(&vfork);
get_task_struct(p);
}
wake_up_new_task(p);
/* forking complete and child started to run, tell ptracer */
if (unlikely(trace))
ptrace_event_pid(trace, pid);
if (clone_flags & CLONE_VFORK) {
if (!wait_for_vfork_done(p, &vfork))
ptrace_event_pid(PTRACE_EVENT_VFORK_DONE, pid);
}
put_pid(pid);
} else {
nr = PTR_ERR(p);
}
return nr;
}
fork does not return two values. Right after a fork system call you simply have two independent processes executing the same code, and the returned pid from fork is the only way to distinguish which process are you in - the parent or the child.
The return value from fork() in the parent, is how the parent finds out the PID of the child process. The child process doesn't need to find out its PID from the fork() call, since it can call getpid(), and find out its parent's PID with getppid().
RETURN VALUE Upon successful completion, fork() returns 0 to the child process and returns the process ID of the child process to the parent process. Otherwise, -1 is returned to the parent process, no child process is created, and errno is set to indicate the error.
A child process is created as its parent process's copy and inherits most of its attributes. If a child process has no parent process, it was created directly by the kernel. If a child process exits or is interrupted, then a SIGCHLD signal is send to the parent process.
You have correctly identified how the new process id is returned to the parent, with return nr
. But you will never actually see a return 0
anywhere since this code is executed on the parent thread. This code is not for the new process that is created.
Now let us examine the _do_fork
function.
...
}
p = copy_process(clone_flags, stack_start, stack_size,
child_tidptr, NULL, trace, tls, NUMA_NO_NODE);
add_latent_entropy();
...
This is where all the magic happens. When you call copy_process
, it internally calls copy_thread
which is a target specific code. This function is responsible for coping the thread related data structures.
Now say we have the target as X86_64 with the calling convention that the return value is returned in the %rax
register. This function then copies 0
into %rax
and copies the value of return_from_fork
address to %rip
(the instruction pointer).
On other platforms the ABI might require the return value to go on the stack. In that case 0
is placed on the stack. copy_thread
is target specific but copy_process
is not.
This is the implementation of copy_thread
for X86_64. You can see around line number 160 the sp registers being set. And at line 182 you can see %ax
(which is a subregister of %rax) being set to 0.
I hope this clears some confusion.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With