I have a unique issue that I have not had a need to address in elxir.
I need to use the dynamic supervisor to start (n) amount of children dynamicly in a clustered environment. I am using libcluster to manage the clustering and use the global process registry to lookup the dynamic supervisor pid.. Here is what is happening:
global: Name conflict terminating {:packer_supervisor, #PID<31555.1430.0>}
Here is the code for the supervisor:
defmodule EcompackingCore.PackerSupervisor do
use DynamicSupervisor
require Logger
def start_link() do
DynamicSupervisor.start_link(__MODULE__, :ok, name: {:global, :packer_supervisor})
end
def init(:ok) do
Logger.info("Starting Packer Supervisor")
DynamicSupervisor.init(strategy: :one_for_one)
end
def add_packer(badge_id, packer_name) do
child_spec = {EcompackingCore.Packer, {badge_id, packer_name}}
DynamicSupervisor.start_child(:global.whereis_name(:packer_supervisor), child_spec)
end
def remove_packer(packer_pid) do
DynamicSupervisor.terminate_child(:global.whereis_name(:packer_supervisor), packer_pid)
end
def children do
DynamicSupervisor.which_children(:global.whereis_name(:packer_supervisor))
end
def count_children do
DynamicSupervisor.count_children(:global.whereis_name(:packer_supervisor))
end
end
The issue seems to be that the supervisor is started on both nodes. What would be the best way to handle this? I really need the supervisor to be dynamic so I can manage the worker modules effectively. Possibly a different registry?
Thanks for your help.
If you want rather simple solution that works with global process registry, you can change your dynamic supervisor start_link
defmodule EcompackingCore.PackerSupervisor do
use DynamicSupervisor
require Logger
def start_link() do
case DynamicSupervisor.start_link(__MODULE__, :ok, name: {:global, :packer_supervisor}) do
{:ok, pid} ->
{:ok, pid}
{:error, {:already_started, pid}} ->
# you need this pid so on each node supervisor
# of this dynamic supervisor can monitor this same pid
# so each node tracks existence of your process
{:ok, pid}
any -> any
end
end
def init(:ok) do
Logger.info("Starting Packer Supervisor")
DynamicSupervisor.init(strategy: :one_for_one)
end
def add_packer(badge_id, packer_name) do
child_spec = {EcompackingCore.Packer, {badge_id, packer_name}}
DynamicSupervisor.start_child(:global.whereis_name(:packer_supervisor), child_spec)
end
def remove_packer(packer_pid) do
DynamicSupervisor.terminate_child(:global.whereis_name(:packer_supervisor), packer_pid)
end
def children do
DynamicSupervisor.which_children(:global.whereis_name(:packer_supervisor))
end
def count_children do
DynamicSupervisor.count_children(:global.whereis_name(:packer_supervisor))
end
end
What many people will say about this solution is that you should not do this since in case of network split, you end-up with two or more global processes in cluster. But you can even deal with this if you implement some node monitoring/tracker so you know how much nodes you "see" in cluster.
For instance, if cluster size iz 5, then you can create check rule that will check if you see 3+ nodes, if not, then you will schedule next start in say 1 second and try to register again globally your dynamic supervisor until check rule returns true (meaning you are in majority group and you can offer consistency across that group). On the other hand if your node is minority group and already holds global dynamic supervisor then shut it down and schedule start in 1 second.
That is simplest way to achieve consistency across the cluster, but there is one thing you should consider. This Dynamic Supervisor will start workers on single node which I'm sure you don't want to, so rather use global registry and some load balancing algorithm to balance processes that should be started in local supervisors.
After a bit of research I found a solution:
I am now using https://github.com/bitwalker/swarm to handle the pid registration. This allows setting up processes across a cluster and offers hand off support if one of the nodes goes down.
You can use a simple central node to monitor other nodes, of course, one supervisor.
This central node only does start, monitor, and use a database to save other nodes' status and pid.
When a node joins and downs, you can receive its down message and handle it(update database).
The only drawback of this method is that you can only have one central node, but as this node does simple things, it is almost stable, which runs on our production system for a year.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With