Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Designing a monitor process for monitoring and restarting processes

Tags:

c

unix

process

I am designing a monitor process. The job of the monitor process is to monitor a few set of configured processes. When the monitor process detects that a process has gone down, it needs to restart the process.

I am developing the code for my linux system. Here is how I developed a small prototype - Fed the details(path, arguments) about the various processes that need to be monitored. - The monitor process did the following: 1. Installed a signal handler for SIGCHLD 2. A fork and execv to start the process to be monitored. Store the pid of the child processes. 3. When a child went down, the parent recevies a SIGCHLD 4. The signal handler will now be called. The handler will run a for loop on the list of pids stored earlier. For each pid, it will check the /proc filesystem for existence of a directory corresponding to the pid. If the directory doesn't exist, the process is restarted.

Now, my question is this - Is the above method (to check the /proc filesystem) a standard or recommended mechanism of checking if a process is running or should I do something like creating a pipe for the ps command and looping through the output of ps ? - Is there a better way of achieving my requirement?

Regards.

like image 664
user500949 Avatar asked Jan 22 '23 03:01

user500949


2 Answers

You should not be checking /proc to determine which process has exited - it's possible for another, unrelated, process to start in the meantime and be coincidentally assigned the same PID.

Instead, within your SIGCHLD handler you should use the waitpid() system call, in a loop such as:

int status;
pid_t child;

while ((child = waitpid(-1, &status, WNOHANG)) > 0)
{
    /* Process with PID 'child' has exited, handle it */
}

(The loop is needed because multiple child processes may exit within a short period of time, but only one SIGCHLD may result).

like image 97
caf Avatar answered Jan 31 '23 09:01

caf


Let's see if I've understood you. You have a list of children and you are running a loop on /proc on your SIGCLD handler to see which children are still alive, isn't it?

That's not very usual,... and it's a but ugly,

What you usually do is run a while((pid = waitpid(-1, &status, WNOHANG))) loop on your SIGCLD handler, and use the returned pid and the Wxxx macros to maintain your children list up to date.

Notice that wait() and waitpid() are async-signal-safe. The functions you are calling to examine /proc are probably not.

like image 22
ninjalj Avatar answered Jan 31 '23 10:01

ninjalj