Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I have to `wait()` for child processes?

Tags:

linux

fork

Even though the linux man page for wait 1 explains very well that you need to wait() for child processes for them no to turn into zombies, it does not tell why at all.

I planned my program (which is my first multithreaded one, so excuse my naivity) around a for(;;)ever loop that starts child processes which get exec()ed away and are sure to terminate on their own.

I cannot use wait(NULL) because that makes parallel computation impossible, therefore I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked, when I use it too late, I get overwhelmed by zombies and cannot fork() anymore, which is not only bad for my process, but can cause unexpected problems on the whole system.

I'll probably have to program some logic of using some maximum number of children and block the parent when that number is reached - but that should be not necessary because most of the children terminate quickly. The other solution that I can think of (creating a two-tiered parent process that spawns concurrent children which in turn concurrently spawn and wait for grandchildren) is too complicated for me right now. Possibly I could also find a non-blocking function to check for the children and use waitpid only when they have terminated.

Nevertheless the question:

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature".

If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies. On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

like image 614
Robby75 Avatar asked Dec 29 '11 08:12

Robby75


People also ask

Why is it necessary for a parent process to wait after forking a child process?

The main reason for this convention is that if the parent process terminates before the children, the children's exit code will be lost and the child process may remain a zombie for a while. After a child process terminates, it will stay around (and consume memory) until the exit code is read.

Does wait () wait for all child processes?

We know if more than one child processes are terminated, then wait() reaps any arbitrarily child process but if we want to reap any specific child process, we use waitpid() function.

What happens if parent process exits before the child?

When a parent process dies before a child process, the kernel knows that it's not going to get a wait call, so instead it makes these processes "orphans" and puts them under the care of init (remember mother of all processes). Init will eventually perform the wait system call for these orphans so they can die.

Why wait function is important for a parent?

If parent process doesn't wait on the child process and the child process exits before the parent process then it becomes a "zombie" process. So, a wait() call is used to "reap" the process and release the system resources associated with the process.


3 Answers

I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked

Check out the documentation for waitpid. You can tell waitpid to NOT block (i.e., return immediately if there are no children to reap) using the WNOHANG option. Moreover, you don't need to give waitpid a PID. You can specify -1, and it will wait for any child. So calling waitpid as below fits your no-blocking constraint and no-saving-pids constraint:

waitpid( -1, &status, WNOHANG );

If you really don't want to properly handle process creation, then you can give the reaping responsibility to init by forking twice, reaping the child, and giving the exec to the grandchild:

pid_t temp_pid, child_pid;
temp_pid = fork();
if( temp_pid == 0 ){
    child_pid = fork();
    if( child_pid == 0 ){
        // exec()
        error( EXIT_FAILURE, errno, "failed to exec :(" );
    } else if( child_pid < 0 ){
        error( EXIT_FAILURE, errno, "failed to fork :(" );
    }
    exit( EXIT_SUCCESS );
} else if( temp_pid < 0 ){
    error( EXIT_FAILURE, errno, "failed to fork :(" );
} else {
    wait( temp_pid );
}

In the above code snippet, the child process forks its own child, immediately exists, and then is immediately reaped by the parent. The grandchild is orphaned, adopted by init, and will be reaped automatically.

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature". If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies.

How else do you propose one may efficiently retrieve the exit code of a process? The problem is that the mapping of PID <=> exit code (et al.) must be one to one. If the kernel released the PID of a process as soon as it exits, reaped or not, and then a new process inherits that same PID and exits, how would you handle storing two codes for one PID? How would an interested process retrieve the exit code for the first process? Don't assume that no one cares about exit codes simply because you don't. What you consider to be a nuisance/bug is widely considered useful and clean.

On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

Something about making a widely accepted kernel behavior a scapegoat for what are clearly frustrations over a badly-maintained/cheap system doesn't seem right.

Typically, your maximum number of processes is limited only by your memory. You can see your limit with:

cat /proc/sys/kernel/threads-max
like image 70
Christopher Neylan Avatar answered Oct 04 '22 14:10

Christopher Neylan


Your reasoning is backwards: The kernel keeps zombies because they store the state that you can retrieve with wait() and related system calls.

The proper way to handle asynchronous child termination is to have a SIGCHLD handler which does the wait() to clean up the child processes.

like image 39
Ben Jackson Avatar answered Oct 04 '22 14:10

Ben Jackson


When a program exits, it returns a return code to the kernel. A zombie process is simply a place to hold the return code until the parent can obtain it. The wait() call lets the kernel know that the return code for that pid is no longer needed, and the zombie is removed.

like image 31
Greg Hewgill Avatar answered Oct 04 '22 12:10

Greg Hewgill