Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent upstart from killing child processes to a daemon?

Situation

I have a daemon I wrote in PHP (not the best language for this, but work with me), and it is made to receive jobs from a queue and process them whenever a job needs to be done. For each new job, I use pcntl_fork() to fork the job off into a child process. Within this child process, I then use proc_open() to execute long-running system commands for audio transcoding, which returns directly to the child when finished. When the job is completely done, the child exits and is cleaned up by the parent process.

To keep this daemon always running, I use upstart. Here is my upstart configuration file:

description "Audio Transcoding Daemon"

start on startup
stop on shutdown
# kill signal SIGCHLD
kill timeout 1200 # Don't force kill the process until it runs over 20 minutes
respawn

exec audio-daemon.php

Goal

Because I want to use this daemon in a distributed environment, I want to be able to shutdown the server at any time without disrupting any running jobs. To do this, I have already implemented signal handlers using pcntl_signal() for SIGTERM, SIGHUP, and SIGINT on the parent process, which waits for all children to exit normally before exiting itself. The children also have signal handlers, but they are made to ignore all kill signals.

Problem

The problem is, according to the docs...

The signal specified by the kill signal stanza is sent to the process group of the main process. (such that all processes belonging to the jobs main process are killed). By default this signal is SIGTERM.

This is concerning because, in my child process, I run system commands through proc_open(), which spawns new child processes as well. So, whenever I run sudo stop audio-daemon, this sub-process (which happens to be sox) is killed immediately, and the job returns back with an error. Apparently, sox obeys SIGTERM and does what it's told...

Originally, I thought, "Fine. I'll just change kill signal to send something that is inherently ignored, and I'll just pick it up in the main process only." But according to the manual, there are only two signals that are ignored by default: SIGCHLD and SIGURG (and possibly SIGWINCH). But I'm afraid of getting false flags, since these can also be triggered other ways.

There are ways to create a custom signal using what the manual calls "Real-time Signals" but it also states...

The default action for an unhandled real-time signal is to terminate the receiving process.

So that doesn't help...

Can you think of any way that I can get upstart to keep all of my sub-processes open until they complete? I really don't want to go digging through sox's source code to modify its signal handlers, and while I could set SIGCHLD, SIGURG, or SIGWINCH as my upstart kill signal and pray nothing else sends them my way, I can't help but think there's a better way to do this... Any ideas?

Thanks for all your help! :)

like image 282
Michael Brook Avatar asked Jan 17 '14 05:01

Michael Brook


2 Answers

Since I haven't received any other answers for how to do this a better way, this is what I ended up doing, and I hope it helps someone out there...

To stall shutdown/reboot of the system until the daemon is finished, I changed my start on and stop on in my upstart configuration. And to keep upstart from killing my children, I resorted to using SIGURG as my kill signal, which I then catch as a kill signal in my main daemon process only.

Here is my final upstart configuration:

description "Audio Transcoding Daemon"

start on runlevel [2345]
stop on starting rc RUNLEVEL=[016] # Block shutdown/reboot until the daemon ends

kill signal SIGURG # Kill the process group with SIGURG instead of SIGTERM so only the main process will pick it up (since SIGURG will be ignored by all children by default)

kill timeout 1200 # Don't force kill the process until it runs over 20 minutes

respawn

exec audio-daemon.php

Note that using stop on starting rc RUNLEVEL=[016] is necessary to stall shutdown/reboot. stop on runlevel [016] will not work.

Also note that if you use SIGURG in your application for any other reason, using it as a kill signal may cause problems. In my case, I wasn't, so this works fine as far as I can tell.

Ideally, it would be nice if the POSIX standard provided a user-defined signal like SIGUSR1 and SIGUSR2 that was ignored by default. But right now, it looks like it doesn't exist.

Feel free to chime in if you have a better answer, but for now, I hope this helps anyone else having this problem.

like image 173
Michael Brook Avatar answered Oct 24 '22 17:10

Michael Brook


Disclaimer: I don't know any PHP

I solved a similar problem with my ruby process by setting a new group id for a launched subprocess. It looks like php has a similar facility.

you can start a new group (detaching from your audio-daemon.php) by settings it's group id to its process id

something like

$chldPid=pcntl_fork()
... << error checks etc
 if ($chldPid){
    ...
    posix_setpgid($chldPid, $chldPid)
like image 20
nhed Avatar answered Oct 24 '22 18:10

nhed