I'm currently in the process of writing a shell. I execute processes and utilize a SIGCHLD
signal handler to clean up (wait on them) when they are complete.
Everything has been working -- except when I execute processes which escalate privileges with sudo
. In these cases, I never get a SIGCHLD
signal -- so I never know that the process has completed executing.
When I receive a command such as sudo ls
, I execute the program sudo
and then provide ls
as a parameter. I perform this execution with execvp
.
If I take a look at ps -aux
after my shell has executed sudo ls
, I see the following:
root 4795 0.0 0.0 4496 1160 pts/29 S+ 16:51 0:00 sudo ls
root 4796 0.0 0.0 0 0 pts/29 Z+ 16:51 0:00 [ls] <defunct>
So, sudo
ran and got assigned pid = 4795
, with the child (ls) being assigned 4796
. The child has completed its task and is now sitting in a zombie state. sudo
doesn't seem to want to reap the zombie process and just sits there.
I would like to know what is causing this behavior -- I've tried different techniques to cleanup these zombie processes, such as running my shell under sudo
and waiting directly on sudo
and the PID
which sudo
executes (4796 in the above example). None of these techniques have worked.
As always, any advise is appreciated.
My first thought is incorrect signal processing but there is not enough information in your post to write test code to replicate your failure. But I can give you some places to look. Pardon me if I cover a few signal basics you already know for future readers.
First of all I do not know if you are using the legacy signal() or the new POSIX sigaction() signal routines to catch signals. sigset() is a useful in between from GNU.
Legacy Signals -- signal()
It's near impossible, if not impossible, to guarantee an air-tight signal processor using the original signal processor in all environments.
while( ( pid = waitpid( -1, &signal, WNOHANG ) ) > 0 )
loop,
until no more signals are found as legacy signals set a bool condition
indicating at least one signal is outstanding.
The actual count is unknown.
Advice, hold your nose and flee from legacy signals.
Lack of a while() loop in a legacy handler and multiple SIGCHILDs, one from your sudo and one or more from unexpected grandchildren fired off by sudo. If only one SIGCHILD is handled when a grandchild signal comes in first, the expected program's signal will not be caught.
POSIX Signals -- sigaction()
POSIX signals can clean up all of the failures of legacy signals.
Lack of a mask can cause weird stuff like loosing track of a signal if you get a SIGCHILD while in a SIGCHILD handler.
GNU -- sigset()
GNU provides an useful in-between that has the same calling signatures as signal() but removes most of the problems. Some additional control functions are also available. Using sigset() is an easy fix for many signal problems.
Reminders
Think of signal handlers as threads in your program,
even if you are not otherwise using threads in the code.
In days of old you needed to do absolutely minimal processing in signal handlers... no calling of library code, such as printf, that have side effects. I still follow this when having to use legacy signal handlers and always use multithread cautions in newer handlers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With