Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor a remote erlang node which was down and is restarting

Tags:

erlang

My application runs in an erlang cluster - with usually two or more nodes. There's active monitoring between the nodes (using erlang:monitor_node) which works fine - I can detect and react to the fact that a node that was up is now down.

But how do I then find out that the node has restarted and is back in business? I can of course periodically ping the node until it is back up, but is there a better way that I've simply missed? Is process groups a better way of achieving this?

(Edited to add)

I think the answer to perform a technique like election of a supervisor is the thought process I was missing. I'll look into that and mark this question as done....

like image 754
Alan Moore Avatar asked Jun 11 '09 22:06

Alan Moore


2 Answers

But how do I then find out that the node has restarted and is back in business? I can of course periodically ping the node until it is back up, but is there a better way that I've simply missed? Is process groups a better way of achieving this?

Just an idea, but how about having the restarting node itself explicitly inform the supervisor/monitoring node that it has finished restarting and that it is available again?

You could use a recurring "heartbeat message" for this purpose, or come up with a custom message specifically meant to be sent once after successful initialization. Something along the lines of:

start(SupervisorPID) ->
  SuperVisorPID ! {hello, MyPID};
  mainloop().
like image 129
none Avatar answered Nov 13 '22 14:11

none


You could create a global_group then use the global_group:monitor_nodes(true) to monitor the other nodes within the same global group. The process that is monitoring the nodes will get nodeup and nodedown messages.

like image 21
Sankar Shanmugam Avatar answered Nov 13 '22 14:11

Sankar Shanmugam