Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do supervisor processes monitor processes? Can the same be done on the JVM?

Erlang fault tolerance (as I understand it) includes the use of supervisor processes to keep an eye on worker processes, so if a worker dies the supervisor can start up a new one.

How does Erlang do this monitoring, especially in a distributed scenario? How can it be sure the process has really died? Does it do heart beats? Is something built into the runtime environment? What if a network cable is unplugged - does it assume the other processes have died if it cannot communicate with them? etc.

I was thinking about how to achieve the same fault tolerance etc claimed by Erlang in the JVM (in say Java or Scala). But I was not sure if it required support built into the JVM to do it as well as Erlang. I had not come across a definition of how Erlang does it yet though as a point of comparison.

like image 532
Alan Kent Avatar asked Jul 19 '09 04:07

Alan Kent


1 Answers

Erlang OTP Supervision is typically not done between processes on different nodes. It would work, but best practice is to do it differently.

The common approach is to write the entire application so it runs on each machine, but the application is aware that it is not alone. And some part of the application has a node monitor so it is aware of node-downs (this is done with simple network ping). These node downs can be used to change load balancing rules or fall over to another master, etc.

This ping means that there is latency in detecting node-downs. It can take quite a few seconds to detect a dead peer node (or dead link to it).

If the supervisor and process runs locally, the crash and the signal to the supervisor is pretty much instantanious. It relies on a feature that an abnormal crash propagates to linked processes that crash as well unless they trap exits.

like image 60
Christian Avatar answered Oct 31 '22 16:10

Christian