I'm trying to deploy a Django app using Uwsgi and supervisor on a machine running Debian 8.1.
When I restart via sudo systemctl restart supervisor
it fails to restart half of the time.
$ root@host:/# systemctl start supervisor Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details. $ root@host:/# systemctl status supervisor.service ● supervisor.service - LSB: Start/stop supervisor Loaded: loaded (/etc/init.d/supervisor) Active: failed (Result: exit-code) since Wed 2015-09-23 11:12:01 UTC; 16s ago Process: 21505 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS) Process: 21511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE) Sep 23 11:12:01 host supervisor[21511]: Starting supervisor: Sep 23 11:12:01 host systemd[1]: supervisor.service: control process exited, code=exited status=1 Sep 23 11:12:01 host systemd[1]: Failed to start LSB: Start/stop supervisor. Sep 23 11:12:01 host systemd[1]: Unit supervisor.service entered failed state.
However there is nothing in the supervisor or uwsgi logs. Supervisor 3.0 is running with this configuration for uwsgi :
[program:uwsgi] stopsignal=QUIT command = uwsgi --ini uwsgi.ini directory = /dir/ environment=ENVIRONMENT=STAGING logfile-maxbytes = 300MB
stopsignal=QUIT has been added because UWSGI ignores the default signal (SIGTERM) on stop and gets killed brutally with SIGKILL leaving orphan workers.
Is there a way I could investigate what's happening ?
EDIT:
Tried as mnencia advised : /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start
but it still fails half of the time.
root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start [ ok ] Stopping supervisor (via systemctl): supervisor.service. ● supervisor.service - LSB: Start/stop supervisor Loaded: loaded (/etc/init.d/supervisor) Active: inactive (dead) since Tue 2015-11-24 13:04:32 UTC; 89ms ago Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS) Process: 23349 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS) Nov 24 13:04:30 xxx supervisor[23349]: Starting supervisor: supervisord. Nov 24 13:04:30 xxx systemd[1]: Started LSB: Start/stop supervisor. Nov 24 13:04:32 xxx systemd[1]: Stopping LSB: Start/stop supervisor... Nov 24 13:04:32 xxx supervisor[23490]: Stopping supervisor: supervisord. Nov 24 13:04:32 xxx systemd[1]: Stopped LSB: Start/stop supervisor. [....] Starting supervisor (via systemctl): supervisor.serviceJob for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details. failed! root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start [ ok ] Stopping supervisor (via systemctl): supervisor.service. ● supervisor.service - LSB: Start/stop supervisor Loaded: loaded (/etc/init.d/supervisor) Active: failed (Result: exit-code) since Tue 2015-11-24 13:04:32 UTC; 1s ago Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS) Process: 23526 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE) Nov 24 13:04:32 xxx systemd[1]: supervisor.service: control process exited, code=exited status=1 Nov 24 13:04:32 xxx systemd[1]: Failed to start LSB: Start/stop supervisor. Nov 24 13:04:32 xxx systemd[1]: Unit supervisor.service entered failed state. Nov 24 13:04:32 xxx supervisor[23526]: Starting supervisor: Nov 24 13:04:33 xxx systemd[1]: Stopped LSB: Start/stop supervisor. [ ok ] Starting supervisor (via systemctl): supervisor.service.
To start a non-running service or stop a running one, use supervisorctl start my-daemon and supervisorctl stop my-daemon . To restart a service, you can also use supervisorctl restart my-daemon .
The supervisor service runs automatically after installation. You can check its status: sudo systemctl status supervisor.
supervisorctl - supervisorctl Documentation Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems. It shares some of the same goals of programs like launchd, daemontools, and runit.
This is not necessarily an error from supervisor. I see from your systemctl status
output that supervisor
is started through the sysv-init compatibility layer, so the failure could be in the /etc/init.d/supervisor
script. It would explain the absence of errors in the supervisord logs.
To debug the init script, the easiest way is to add a set -x
as first non-comment instruction in that file, and look in the journalctl
output the trace of the script execution.
EDIT:
I've reproduced and debugged it on a test system with Debian Sid.
The issue is that the stop target of the supervisor init-script does not check if the daemon has been really terminated but only send a signal if the process exists. If the daemon process takes a while to shutdown, the subsequent start action will fail due to the dying daemon process, which is counted as already running.
I've opened a bug on Debian Bug Tracker: http://bugs.debian.org/805920
WORKAROUND:
You can workaround the issue with:
/etc/init.d/supervisor force-stop && \ /etc/init.d/supervisor stop && \ /etc/init.d/supervisor start
force-stop
will ensure the supervisord has been terminated (outside systemd).stop
make sure systemd know it's terminatedstart
starts it againThe stop
after the force-stop
is required otherwise systemd will ignore any subsequent start
request. stop
and start
can be combined using restart
, but here I've put both of them to show how it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With