I have a number of Python workers managed by supervisord that should continuously print to stdout (after each completed task) if they are working properly. However, they tend to hang, and we've had difficulty finding the bug. Ideally supervisord would notice that they haven't printed in X minutes and restart them; the tasks are idempotent, so non-graceful restarts are fine. Is there any supervisord feature or addon that can do this? Or another supervisor-like program that has this out of the box?
We are already using http://superlance.readthedocs.io/en/latest/memmon.html to kill if memory usage skyrockets, which mitigates some of the hangs, but a hang that doesn't cause a memory leak can still cause the workers to reach a standstill.
To start a non-running service or stop a running one, use supervisorctl start my-daemon and supervisorctl stop my-daemon . To restart a service, you can also use supervisorctl restart my-daemon .
It doesn't kill the supervisord process, it just stops all processes, reload the configuration file, and restart processes again.
supervisorctl - supervisorctl Documentation Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems. It shares some of the same goals of programs like launchd, daemontools, and runit.
Finally, you can exit supervisorctl with Ctrl+C or by entering quit into the prompt: supervisor> quit.
One possible solution would be to wrap your python script in a bash script that'd monitor it and exit if there isn't output to stdout for a period of time.
For example:
kill-if-hung.sh
#!/usr/bin/env bash
set -e
TIMEOUT=60
LAST_CHANGED="$(date +%s)"
{
set -e
while true; do
sleep 1
kill -USR1 $$
done
} &
trap check_output USR1
check_output() {
CURRENT="$(date +%s)"
if [[ $((CURRENT - LAST_CHANGED)) -ge $TIMEOUT ]]; then
echo "Process STDOUT hasn't printed in $TIMEOUT seconds"
echo "Considering process hung and exiting"
exit 1
fi
}
STDOUT_PIPE=$(mktemp -u)
mkfifo $STDOUT_PIPE
trap cleanup EXIT
cleanup() {
kill -- -$$ # Send TERM to child processes
[[ -p $STDOUT_PIPE ]] && rm -f $STDOUT_PIPE
}
$@ >$STDOUT_PIPE || exit 2 &
while true; do
if read tmp; then
echo "$tmp"
LAST_CHANGED="$(date +%s)"
fi
done <$STDOUT_PIPE
Then you would run a python script in supervisord like: kill-if-hung.sh python -u some-script.py
(-u
to disable output buffering, or set PYTHONUNBUFFERED
).
I'm sure you could imagine a python script that'd do something similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With