I have to migrate a C-program from OpenVMS to Linux, and have now difficulties with a program generating subprocesses. A subprocess is generated (fork works fine), but execve fails (which is correct, as the wrong program name is given).
But to reset the number of active subprocesses, I afterwards call a wait() which does not return. When I look at the process via ps, I see that there are no more subprocesses, but wait() does not return ECHILD as I had thought.
while (jobs_to_be_done)
{
if (running_process_cnt < max_process_cnt)
{
if ((pid = vfork()) == 0)
{
params[0] = param1 ;
params[1] = NULL ;
if ((cstatus = execv(command, params)) == -1)
{
perror("Child - Exec failed") ; // this happens
exit(EXIT_FAILURE) ;
}
}
else if (pid < 0)
{
printf("\nMain - Child process failed") ;
}
else
{
running_process_cnt++ ;
}
}
else // no more free process slot, wait
{
if ((pid = wait(&cstatus)) == -1) // does not return from this statement
{
if (errno != ECHILD)
{
perror("Main: Wait failed") ;
}
anz_sub = 0 ;
}
else
{
...
}
}
}
Is the anything that has to be done to tell the wait-command that there are no more subprocesses? With OpenVMS the program works fine.
Thanks a lot in advance for your help
I don't recommend using vfork these days on Linux, since fork(2) is efficient enough, thanks to lazy copy-on-write techniques in the Linux kernel.
You should check the result of fork. Unless it is failing, a process has been created, and wait (or waitpid(2), perhaps with WNOHANG if you don't want to really wait, but just find out about already ended child processes ...) should not fail (even if the exec function in the child has failed, the fork did succeed).
You might also carefully use the SIGCHLD signal, see signal(7). A defensive way of using signals is to set some volatile sigatomic_t flag in signal handlers, and test and clear these flags inside your loop. Recall that only async signal safe functions (and there are quite few of them) can be called -even indirectly- inside a signal handler. Read also about POSIX signals.
Take time to read Advanced Linux Programming to get a wider picture in your mind. Don't try to mimic OpenVMS on POSIX, but think in a POSIX or Linux way!
You probably may want to always waitpid in your loop, perhaps (sometimes or always) with WNOHANG. So waitpid should not be only called in the else part of your if (running_process_cnt < max_process_cnt) but probably in every iteration of your loop.
You might want to compile with all warnings & debug info (gcc -Wall -Wextra -g) then use the gdb debugger. You could also strace(1) your program (probably with -f)
You might want to learn about memory overcommitment. I dislike this feature and usually disable it (e.g. by running echo 0 > /proc/sys/vm/overcommit_memory as root). See also proc(5) -which is very useful to know about...
From man vfork:
The child must not return from the current function or call exit(3), but may call _exit(2)
You must not call exit() when the call to execv (after vfork) fails - you must use _exit() instead. It is quite possible that this alone is causing the problem you see with wait not returning.
I suggest you use fork instead of vfork. It's much easier and safer to use.
If that alone doesn't solve the problem, you need to do some debugging or reduce the code down until you find the cause. For example the following should run without hanging:
#include <sys/wait.h>
int main(int argc, char ** argv)
{
pid_t pid;
int cstatus;
pid = wait(&cstatus);
return 0;
}
If you can verify that this program doesn't hang, then it must be some aspect of your program that is causing a hang. I suggest putting in print statements just before and after the call to wait.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With