Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use a shell script to supervise a program?

Tags:

bash

shell

perl

I've searched around but haven't quite found what I'm looking for. In a nutshell I have created a bash script to run in a infinite while loop, sleeping and checking if a process is running. The only problem is even if the process is running, it says it is not and opens another instance.

I know I should check by process name and not process id, since another process could jump in and take the id. However all perl programs are named Perl5.10.0 on my system, and I intend on having multiple instances of the same perl program open.

The following "if" always returns false, what am I doing wrong here???

while true; do

 if [ ps -p $pid ]; then
  echo "Program running fine"
  sleep 10

 else
  echo "Program being restarted\n"
  perl program_name.pl &
  sleep 5
  read -r pid < "${filename}_pid.txt"
 fi

done
like image 278
user387049 Avatar asked Jul 21 '10 23:07

user387049


People also ask

How do I know if bash is running a program?

Bash commands to check running process: pgrep command – Looks through the currently running bash processes on Linux and lists the process IDs (PID) on screen. pidof command – Find the process ID of a running program on Linux or Unix-like system.

What is the use of shell script?

Shell scripts allow us to program commands in chains and have the system execute them as a scripted event, just like batch files. They also allow for far more useful functions, such as command substitution.

How to execute shell scripts in Linux terminal?

Execute Shell Script With Zsh Recommended Read: How to Run Multiple Linux Commands at Once in Linux Terminal [Essential Beginners Tip] Method 2: Execute shell script by specifying its path The other method to run a shell script is by providing its path.

How do I run a shell script from a file?

Method 2: Execute shell script by specifying its path The other method to run a shell script is by providing its path. But for that to be possible, your file must be executable. Otherwise, you’ll have “permission denied” error when you try to execute the script.

What are the different topics in shell scripting?

In shell scripting, there are numerous topics that enable it to perform the required task using shell scripts. Some of these are loops, parameter scripting, parameter shifting, getopts, case, eval, etc. Now, let us start with the basic question, of how does a user runs a shell script in a Linux operating system.


4 Answers

Get rid of the square brackets. It should be:

if ps -p $pid; then

The square brackets are syntactic sugar for the test command. This is an entirely different beast and does not invoke ps at all:

if test ps -p $pid; then

In fact that yields "-bash: [: -p: binary operator expected" when I run it.

like image 78
John Kugelman Avatar answered Oct 03 '22 10:10

John Kugelman


Aside from the syntax error already pointed out, this is a lousy way to ensure that a process stays alive.

First, you should find out why your program is dying in the first place; this script doesn't fix a bug, it tries to hide one.

Secondly, if it is so important that a program remain running, why do you expect your (at least once already) buggy shell script will do the job? Use a system facility that is specifically designed to restart server processes. If you say what platform you are using and the nature of your server process. I can offer more concrete advice.

added in response to comment:

Sure, there are engineering exigencies, but as the OP noted in the OP, there is still a bug in this attempt at a solution:

I know I should check by process name and not process id, since another process could jump in and take the id.

So now you are left with a PID tracking script, not a process "nanny". Although the chances are small, the script as it now stands has a ten second window in which

  1. the "monitored" process fails
  2. I start up my week long emacs process which grabs the same PID
  3. the nanny script continues on blissfully unaware that its dependent has failed

The script isn't merely buggy, it is invalid because it presumes that PIDs are stable identifiers of a process. There are ways that this could be better handled even at the shell script level. The simplest is to never detach the execution of perl from the script since the script is doing nothing other than watching the subprocess. For example:

while true ; do
    if perl program_name.pl ; then
         echo "program_name terminated normally, restarting"
    else
         echo "oops program_name died again, restarting"
    fi
done

Which is not only shorter and simpler, but it actually blocks for the condition that you are really interested in: the run-state of the perl program. The original script repeatedly checks a bad proxy indication of the run state condition (the PID) and so can get it wrong. And, since the whole purpose of this nanny script is to handle faults, it would be bad if it were faulty itself by design.

like image 44
msw Avatar answered Oct 03 '22 10:10

msw


I totally agree that fiddling with the PID is nearly always a bad idea. The while true ; do ... done script is quite good, however for production systems there a couple of process supervisors which do exactly this and much more, e.g.

  • enable you to send signals to the supervised process (without knowing it's PID)
  • check how long a service has been up or down
  • capturing its output and write it to a log file

Examples of such process supervisors are daemontools or runit. For a more elaborate discussion and examples see Init scripts considered harmful. Don't be disturbed by the title: Traditional init scripts suffer from exactly the same problem like you do (they start a daemon, keep it's PID in a file and then leave the daemon alone).

like image 24
Jonas Avatar answered Oct 03 '22 09:10

Jonas


I agree that you should find out why your program is dying in the first place. However, an ever running shell script is probably not a good idea. What if this supervising shell script dies? (And yes, get rid of the square braces around ps -p $pid. You want the exit status of ps -p $pid command. The square brackets are a replacement for the test command.)

There are two possible solutions:

  1. Use cron to run your "supervising" shell script to see if the process you're supervising is still running, and if it isn't, restart it. The supervised process can output it's PID into a file. Your supervising program can then cat this file and get the PID to check.

  2. If the program you're supervising is providing a service upon a particular port, make it an inetd service. This way, it isn't running at all until there is a request upon that port. If you set it up correctly, it will terminate when not needed and restart when needed. Takes less resources and the OS will handle everything for you.

like image 38
David W. Avatar answered Oct 03 '22 09:10

David W.