how to use a shell script to supervise a program?

Tags:

I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.

I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.

The following "if" always returns false, what am I doing wrong here???

while true; do

 if [ ps -p $pid ]; then
  echo "Program running fine"
  sleep 10

 else
  echo "Program being restarted\n"
  perl program_name.pl &
  sleep 5
  read -r pid < "${filename}_pid.txt"
 fi

done

278

asked Jul 21 '10 23:07

user387049

4 Answers

Get rid of the square brackets. It should be:

if ps -p $pid; then

The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:

if test ps -p $pid; then

In fact that yields "-bash: [: -p: binary operator expected" when I run it.

answered Oct 03 '22 10:10

John Kugelman

Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.

First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.

Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.

added in response to comment:

Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:

I know I should check by process name and not process id, since another process could jump in and take the id.

So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which

the "monitored" process fails
I start up my week long emacs process which grabs the same PID
the nanny script continues on blissfully unaware that its dependent has failed

The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:

while true ; do
    if perl program_name.pl ; then
         echo "program_name terminated normally, restarting"
    else
         echo "oops program_name died again, restarting"
    fi
done

Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.

answered Oct 03 '22 10:10

msw

I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.

enable you to send signals to the supervised process (without knowing it's PID)
check how long a service has been up or down
capturing its output and write it to a log file

Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).

answered Oct 03 '22 09:10

Jonas

I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)

There are two possible solutions:

Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.
If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.

answered Oct 03 '22 09:10

David W.

Related questions
                            
                                How can I grep through an array, while filtering out matches?
                            
                                Is it viable to start with Catalyst while learning Perl?
                            
                                How to untaint system call in CGI.pm
                            
                                Truncate (not round) decimal places in sprintf?
                            
                                How to combine the data from two CSV files in BASH?
                            
                                Declare and populate a hash table in one step in Perl
                            
                                View Perl Variables as Bytes/Bits
                            
                                How to print a Perl character class?
                            
                                Perl Global symbol requires explicit package name
                            
                                How to extract the words through pattern matching?
                            
                                Is there a cell length limit writing CSV files with Text::CSV?
                            
                                Why can I print this treating as a reference and treating it as a scalar?
                            
                                Perl XML::LibXML: how to access comment nodes
                            
                                'use warnings' vs. '#!/usr/bin/perl -w' Is there a difference?
                            
                                Is it possible to have two different Perl versions?
                            
                                How to subsitute with variable options in perl script
                            
                                Perl optimizer question: Will the perl compiler optimize away all of these temporary variables?
                            
                                Why is my Perl regex using so much memory?
                            
                                How can I insert text into a string in Perl?
                            
                                In Perl, how can I change an element in an XML file without changing the format of the XML file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to use a shell script to supervise a program?

Tags:

bash

shell

perl