We have a mission-critical server program on top of Linux and we don't want others to terminate it accidentally. If somebody terminates it or it crashes, we want it to restart.
So we plan to write another program, say program B. We want program B and the server program to protect each other. If our server program exits, program B will restart it. If program B terminates, the server program will start it again. But we don't have a good mechanism that would let program B and the server program be notified when its peer exits.
You can use init to babysit the process, and since init only terminates on reboot, you don't need a "program B".
Add to the end of /etc/inittab:
x:3:respawn:/path/to/my/program
For information on syntax and other options can be found in man inittab
You can restart your server from inside itself using fork. Oh the beauty of Unix.
Something like:
int result = fork();
if(result == 0)
DoServer();
if(result < 0)
{
perror(); exit(1);
}
for(;;)
{
int status = 0;
waitpid(-1, &status, 0);
if(!WIFEXITED(status))
{
result = fork();
if(result == 0)
DoServer();
if(result < 0)
{
puts("uh... crashed and cannot restart");
exit(1);
}
}
else exit(0);
}
EDIT:
It's probably wise to use the WIFEXITED
macro as test condition, which is more concise and portable (changed code accordingly). Plus, it fittingly models the semantics that we probably want.
waitpid
, given zero flags, won't return anything but either normal or abnormal termination. WIFEXITED
results in true
if the process exited normally, such as by returning from main
or calling exit
. If the process exited normally (e.g. because you requested that), one very probably does not want to keep restarting it until the end of days!
Would a system like http://supervisord.org/ not be vialble for you? We have supervisor monitor several process and I can attest to it's features. it is very nice if it will work for your application.
They would have to poll each other, typically. Have them send signal zero to each other (which just checks for aliveness and does not interrupt the other program).
echo $$>$1
read otherpid < $2
while :; do
while kill -0 $otherpid
do
sleep 1
done
# restart other program
# (really restarting myself in my peer configuration)
$0 $2 $1 &
newpid=0
while [ "$newpid" -eq "$otherpid" ]
do
sleep 2
read newpid < $2
done
otherpid=$newpid
done
You could go more fancy and try to do watchdog stuff to make sure that the program is not only existing, but actually running.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With