Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Supervisors with backoff

I have a supervisor with two worker processes: a TCP client which handles connection to a remote server and an FSM which handles the connection protocol.

Handling TCP errors in the child process complicates code significantly. So I'd prefer to "let it crash", but this has a different problem: when the server is unreachable, the maximum number of restarts will be quickly reached and the supervisor will crash along with my entire application, which is quite undesirable for this case.

What I'd like is to have a restart strategy with back-off; failing that, it would be good enough if the supervisor was aware when it is restarted due to a crash (i.e. had it passed as a parameter to the init function). I've found this mailing list thread, but is there a more official/better tested solution?

like image 338
Alexey Romanov Avatar asked Sep 24 '10 09:09

Alexey Romanov


People also ask

How can the supervisor process be stopped?

Finally, you can exit supervisorctl with Ctrl+C or by entering quit into the prompt: supervisor> quit.

What is the purpose of Supervisord?

Supervisord or Supervisor daemon is an open source process management system. In a nutshell: if a process crashes for any reason, Supervisor restarts it. From the Supervisord website: Supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.


2 Answers

You might find our supervisor cushion to be a good starting point. I use it slow down the restart on things that must be running, but are failing quickly on startup (such as ports that are encountering a resource problem).

like image 102
Dustin Avatar answered Sep 29 '22 14:09

Dustin


I've had this problem many times working with erlang and tried many solutions. I think the best best I've found is to have an extra process that is started by the supervisor and starts the that might crash.

It starts the child on start-up, awaits child exits and restarts the child (with a delay) or exits as appropriate. I think this is simpler than the back-off server (which you link to) as you only need to keep state regarding a single child.

Another solution that I've used is to have to start the child processes as transient and have a separate process that polls and issues restarts to any processes that have crashed.

like image 23
cthulahoops Avatar answered Sep 29 '22 14:09

cthulahoops