Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Supervisor fails to restart half of the time

Tags:

I'm trying to deploy a Django app using Uwsgi and supervisor on a machine running Debian 8.1.

When I restart via sudo systemctl restart supervisor it fails to restart half of the time.

$ root@host:/# systemctl start supervisor     Job for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details. $ root@host:/# systemctl status supervisor.service     ● supervisor.service - LSB: Start/stop supervisor        Loaded: loaded (/etc/init.d/supervisor)        Active: failed (Result: exit-code) since Wed 2015-09-23 11:12:01 UTC; 16s ago       Process: 21505 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)       Process: 21511 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)     Sep 23 11:12:01 host supervisor[21511]: Starting supervisor:     Sep 23 11:12:01 host systemd[1]: supervisor.service: control process exited, code=exited status=1     Sep 23 11:12:01 host systemd[1]: Failed to start LSB: Start/stop supervisor.     Sep 23 11:12:01 host systemd[1]: Unit supervisor.service entered failed state. 

However there is nothing in the supervisor or uwsgi logs. Supervisor 3.0 is running with this configuration for uwsgi :

[program:uwsgi] stopsignal=QUIT command = uwsgi --ini uwsgi.ini directory = /dir/ environment=ENVIRONMENT=STAGING logfile-maxbytes = 300MB 

stopsignal=QUIT has been added because UWSGI ignores the default signal (SIGTERM) on stop and gets killed brutally with SIGKILL leaving orphan workers.

Is there a way I could investigate what's happening ?

EDIT:

Tried as mnencia advised : /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start but it still fails half of the time.

 root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start     [ ok ] Stopping supervisor (via systemctl): supervisor.service.     ● supervisor.service - LSB: Start/stop supervisor        Loaded: loaded (/etc/init.d/supervisor)        Active: inactive (dead) since Tue 2015-11-24 13:04:32 UTC; 89ms ago       Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)       Process: 23349 ExecStart=/etc/init.d/supervisor start (code=exited, status=0/SUCCESS)      Nov 24 13:04:30 xxx supervisor[23349]: Starting supervisor: supervisord.     Nov 24 13:04:30 xxx systemd[1]: Started LSB: Start/stop supervisor.     Nov 24 13:04:32 xxx systemd[1]: Stopping LSB: Start/stop supervisor...     Nov 24 13:04:32 xxx supervisor[23490]: Stopping supervisor: supervisord.     Nov 24 13:04:32 xxx systemd[1]: Stopped LSB: Start/stop supervisor.     [....] Starting supervisor (via systemctl): supervisor.serviceJob for supervisor.service failed. See 'systemctl status supervisor.service' and 'journalctl -xn' for details.      failed!     root@host:~# /etc/init.d/supervisor stop && while /etc/init.d/supervisor status ; do sleep 1; done && /etc/init.d/supervisor start     [ ok ] Stopping supervisor (via systemctl): supervisor.service.     ● supervisor.service - LSB: Start/stop supervisor        Loaded: loaded (/etc/init.d/supervisor)        Active: failed (Result: exit-code) since Tue 2015-11-24 13:04:32 UTC; 1s ago       Process: 23490 ExecStop=/etc/init.d/supervisor stop (code=exited, status=0/SUCCESS)       Process: 23526 ExecStart=/etc/init.d/supervisor start (code=exited, status=1/FAILURE)  Nov 24 13:04:32 xxx systemd[1]: supervisor.service: control process exited, code=exited status=1 Nov 24 13:04:32 xxx systemd[1]: Failed to start LSB: Start/stop supervisor. Nov 24 13:04:32 xxx systemd[1]: Unit supervisor.service entered failed state. Nov 24 13:04:32 xxx supervisor[23526]: Starting supervisor: Nov 24 13:04:33 xxx systemd[1]: Stopped LSB: Start/stop supervisor. [ ok ] Starting supervisor (via systemctl): supervisor.service. 
like image 845
Paul K. Avatar asked Sep 23 '15 11:09

Paul K.


People also ask

How does a supervisor restart?

To start a non-running service or stop a running one, use supervisorctl start my-daemon and supervisorctl stop my-daemon . To restart a service, you can also use supervisorctl restart my-daemon .

How can I check my supervisor status?

The supervisor service runs automatically after installation. You can check its status: sudo systemctl status supervisor.

What is supervisor CTL?

supervisorctl - supervisorctl Documentation Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems. It shares some of the same goals of programs like launchd, daemontools, and runit.


1 Answers

This is not necessarily an error from supervisor. I see from your systemctl status output that supervisor is started through the sysv-init compatibility layer, so the failure could be in the /etc/init.d/supervisor script. It would explain the absence of errors in the supervisord logs.

To debug the init script, the easiest way is to add a set -x as first non-comment instruction in that file, and look in the journalctl output the trace of the script execution.

EDIT:

I've reproduced and debugged it on a test system with Debian Sid.

The issue is that the stop target of the supervisor init-script does not check if the daemon has been really terminated but only send a signal if the process exists. If the daemon process takes a while to shutdown, the subsequent start action will fail due to the dying daemon process, which is counted as already running.

I've opened a bug on Debian Bug Tracker: http://bugs.debian.org/805920

WORKAROUND:

You can workaround the issue with:

/etc/init.d/supervisor force-stop && \ /etc/init.d/supervisor stop && \ /etc/init.d/supervisor start 
  • force-stop will ensure the supervisord has been terminated (outside systemd).
  • stop make sure systemd know it's terminated
  • start starts it again

The stop after the force-stop is required otherwise systemd will ignore any subsequent start request. stop and start can be combined using restart, but here I've put both of them to show how it works.

like image 99
mnencia Avatar answered Nov 05 '22 20:11

mnencia