I want to start a unique global registered gen_server process in an erlang cluster. If the process is stopped or the node running it goes down, the process is to be started on one of the other nodes.
The process is part of a supervisor. The problem is that starting the supervisor on a second node fails because the gen_server is already running and registerd globally from the first node.
{ok, Pid}
of the already running process instead of launching a new gen_server instance?
global:trans()
inside the gen_server's start_link function?
start_link() ->
global:trans({?MODULE, ?MODULE}, fun() ->
case gen_server:start_link({global, ?MODULE}, ?MODULE, [], []) of
{ok, Pid} ->
{ok, Pid};
{error, {already_started, Pid}} ->
link(Pid),
{ok, Pid};
Else -> Else
end
end).
If you return {ok, Pid} of something you don't link to it will confuse a supervisor that relies on the return value. If you're not going to have a supervisor use this as a start_link function you can get away with it.
Your approach seems like it should work as each node will try to start a new instance if the global one dies. You may find that you need to increase the MaxR
value in your supervisor setup as you'll get process messages every time the member of the cluster changes.
One way I've created global singletons in the past is to run the process on all the nodes, but have one of them (the one that wins the global registration race) be the master. The other processes monitor the master and when the master exits, try to become the master. (And again, if they don't win the registration race then they monitor the pid of the one that did). If you do this, you have to handle the global name registration yourself (i.e. don't use the gen_server:start({global, ...
functionality) because you want the process to start whether or not it wins the registration, it will simply behave differently in each case.
The process itself must be more complicated (it has to run in both master and non-master modes), but it stabilizes quickly and doesn't produce a lot of log spam with supervisor start attempts.
My method usually requires a few rounds of revision to shake out the corner cases, but is to my mind less hassle than writing an OTP Distributed Application. This method has another advantage over distributed applications in that you don't have to statically configure the list of nodes involved in your cluster - any node can be a candidate for running the master copy of the process. Your approach has this same property.
How about turning the gen_server into an application and using distributed applications?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With