Why do I have to `wait()` for child processes?

Tags:

fork

Even though the linux man page for wait 1 explains very well that you need to wait() for child processes for them no to turn into zombies, it does not tell why at all.

I planned my program (which is my first multithreaded one, so excuse my naivity) around a for(;;)ever loop that starts child processes which get exec()ed away and are sure to terminate on their own.

I cannot use wait(NULL) because that makes parallel computation impossible, therefore I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked, when I use it too late, I get overwhelmed by zombies and cannot fork() anymore, which is not only bad for my process, but can cause unexpected problems on the whole system.

I'll probably have to program some logic of using some maximum number of children and block the parent when that number is reached - but that should be not necessary because most of the children terminate quickly. The other solution that I can think of (creating a two-tiered parent process that spawns concurrent children which in turn concurrently spawn and wait for grandchildren) is too complicated for me right now. Possibly I could also find a non-blocking function to check for the children and use waitpid only when they have terminated.

Nevertheless the question:

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature".

If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies. On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

614

asked Dec 29 '11 08:12

Robby75

3 Answers

I'll probably have to add a process table that stores the child pids and have to use waitpid - not immideately, but after some time has passed - which is a problem, because the running time of the children varies from few microseconds to several minutes. If I use waitpid too early, my parent process will get blocked

Check out the documentation for waitpid. You can tell waitpid to NOT block (i.e., return immediately if there are no children to reap) using the WNOHANG option. Moreover, you don't need to give waitpid a PID. You can specify -1, and it will wait for any child. So calling waitpid as below fits your no-blocking constraint and no-saving-pids constraint:

Click to copy

waitpid( -1, &status, WNOHANG );

If you really don't want to properly handle process creation, then you can give the reaping responsibility to init by forking twice, reaping the child, and giving the exec to the grandchild:

Click to copy

pid_t temp_pid, child_pid;
temp_pid = fork();
if( temp_pid == 0 ){
    child_pid = fork();
    if( child_pid == 0 ){
        // exec()
        error( EXIT_FAILURE, errno, "failed to exec :(" );
    } else if( child_pid < 0 ){
        error( EXIT_FAILURE, errno, "failed to fork :(" );
    }
    exit( EXIT_SUCCESS );
} else if( temp_pid < 0 ){
    error( EXIT_FAILURE, errno, "failed to fork :(" );
} else {
    wait( temp_pid );
}

In the above code snippet, the child process forks its own child, immediately exists, and then is immediately reaped by the parent. The grandchild is orphaned, adopted by init, and will be reaped automatically.

Why does Linux keep zombies at all? Why do I have to wait for my children? Is this to enforce discipline on parent processes? In decades of using Linux I have never got anything useful out of zombie processes, I don't quite get the usefulness of zombies as a "feature". If the answer is that parent processes need to have a way to find out what happened to their children, then for god's sake there is no reason to count zombies as normal processes and forbid the creation of non-zombie processes just because there are too many zombies.

How else do you propose one may efficiently retrieve the exit code of a process? The problem is that the mapping of PID <=> exit code (et al.) must be one to one. If the kernel released the PID of a process as soon as it exits, reaped or not, and then a new process inherits that same PID and exits, how would you handle storing two codes for one PID? How would an interested process retrieve the exit code for the first process? Don't assume that no one cares about exit codes simply because you don't. What you consider to be a nuisance/bug is widely considered useful and clean.

On the system I'm currently developing for I can only spawn 400 to 500 processes before everything grinds to halt (it's a badly maintained CentOS system running on the cheapest VServer I could find - but still 400 zombies are less than a few kB of information)

Something about making a widely accepted kernel behavior a scapegoat for what are clearly frustrations over a badly-maintained/cheap system doesn't seem right.

Typically, your maximum number of processes is limited only by your memory. You can see your limit with:

Click to copy

cat /proc/sys/kernel/threads-max

answered Oct 04 '22 14:10

Christopher Neylan

Your reasoning is backwards: The kernel keeps zombies because they store the state that you can retrieve with wait() and related system calls.

The proper way to handle asynchronous child termination is to have a SIGCHLD handler which does the wait() to clean up the child processes.

answered Oct 04 '22 14:10

Ben Jackson

When a program exits, it returns a return code to the kernel. A zombie process is simply a place to hold the return code until the parent can obtain it. The wait() call lets the kernel know that the return code for that pid is no longer needed, and the zombie is removed.

answered Oct 04 '22 12:10

Greg Hewgill

Related questions
                            
                                Scalable http session management (java, linux)
                            
                                Windows equivalent to Linux's readahead syscall?
                            
                                how to port c/c++ applications to legacy linux kernel versions
                            
                                Reliability of Linux kernel add_timer at resolution of one jiffy?
                            
                                How can i create new virtual mouse device on my android device?
                            
                                C: Accessing lookup tables faster?
                            
                                "Cannot start Omnisharp because Mono version >=3.10.0 is required"
                            
                                'omp.h' file not found when compiling using clang
                            
                                Random mmaped memory access up to 16% slower than heap data access
                            
                                Embedded Linux for total beginner
                            
                                how to print std::map value in gdb
                            
                                Race condition in glibc/NPTL/Linux robust mutexes?
                            
                                Why does clang still need libgcc.a to compile my code?
                            
                                Why FileSystemWatcher doesn't work in Linux container watching Windows volume
                            
                                Fake serial communication under Linux
                            
                                Installing gcc on linux without c compiler
                            
                                Beep on Linux in C
                            
                                Is there a Linux radio standard?
                            
                                Why is alternatives command used when installing Java on a Linux machine
                            
                                How to find man pages for C structs (struct sockaddr_in)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do I have to `wait()` for child processes?

Tags:

linux

fork

Robby75

People also ask

3 Answers

Christopher Neylan

Ben Jackson

Greg Hewgill

Recent Activity

Donate For Us