Global Dynamic Supervisor in a cluster

Question

I have a unique issue that I have not had a need to address in elxir.

I need to use the dynamic supervisor to start (n) amount of children dynamicly in a clustered environment. I am using libcluster to manage the clustering and use the global process registry to lookup the dynamic supervisor pid.. Here is what is happening:

global: Name conflict terminating {:packer_supervisor, #PID<31555.1430.0>}

Here is the code for the supervisor:

defmodule EcompackingCore.PackerSupervisor do
  use DynamicSupervisor
  require Logger

  def start_link() do
    DynamicSupervisor.start_link(__MODULE__, :ok, name: {:global, :packer_supervisor})
  end

  def init(:ok) do
    Logger.info("Starting Packer Supervisor")
    DynamicSupervisor.init(strategy: :one_for_one)
  end

  def add_packer(badge_id, packer_name) do
    child_spec = {EcompackingCore.Packer, {badge_id, packer_name}}
    DynamicSupervisor.start_child(:global.whereis_name(:packer_supervisor), child_spec)
  end

  def remove_packer(packer_pid) do
    DynamicSupervisor.terminate_child(:global.whereis_name(:packer_supervisor), packer_pid)
  end

  def children do
    DynamicSupervisor.which_children(:global.whereis_name(:packer_supervisor))
  end

  def count_children do
    DynamicSupervisor.count_children(:global.whereis_name(:packer_supervisor))
  end

end

The issue seems to be that the supervisor is started on both nodes. What would be the best way to handle this? I really need the supervisor to be dynamic so I can manage the worker modules effectively. Possibly a different registry?

Thanks for your help.

Milan Jaric · Accepted Answer

If you want rather simple solution that works with global process registry, you can change your dynamic supervisor start_link

defmodule EcompackingCore.PackerSupervisor do
  use DynamicSupervisor
  require Logger

  def start_link() do
    case DynamicSupervisor.start_link(__MODULE__, :ok, name: {:global, :packer_supervisor}) do
      {:ok, pid} ->
        {:ok, pid}
      {:error, {:already_started, pid}} ->
        # you need this pid so on each node supervisor 
        # of this dynamic supervisor can monitor this same pid
        # so each node tracks existence of your process
        {:ok, pid}
      any -> any
    end
  end

  def init(:ok) do
    Logger.info("Starting Packer Supervisor")
    DynamicSupervisor.init(strategy: :one_for_one)
  end

  def add_packer(badge_id, packer_name) do
    child_spec = {EcompackingCore.Packer, {badge_id, packer_name}}
    DynamicSupervisor.start_child(:global.whereis_name(:packer_supervisor), child_spec)
  end

  def remove_packer(packer_pid) do
    DynamicSupervisor.terminate_child(:global.whereis_name(:packer_supervisor), packer_pid)
  end

  def children do
    DynamicSupervisor.which_children(:global.whereis_name(:packer_supervisor))
  end

  def count_children do
    DynamicSupervisor.count_children(:global.whereis_name(:packer_supervisor))
  end
end

What many people will say about this solution is that you should not do this since in case of network split, you end-up with two or more global processes in cluster. But you can even deal with this if you implement some node monitoring/tracker so you know how much nodes you "see" in cluster.

For instance, if cluster size iz 5, then you can create check rule that will check if you see 3+ nodes, if not, then you will schedule next start in say 1 second and try to register again globally your dynamic supervisor until check rule returns true (meaning you are in majority group and you can offer consistency across that group). On the other hand if your node is minority group and already holds global dynamic supervisor then shut it down and schedule start in 1 second.

That is simplest way to achieve consistency across the cluster, but there is one thing you should consider. This Dynamic Supervisor will start workers on single node which I'm sure you don't want to, so rather use global registry and some load balancing algorithm to balance processes that should be started in local supervisors.

Swarm has builtin ring and static quorum ring algorithms in place, but that uses hashing to distribute load across cluster. It is good solution if your workers have ID for which you can calculate hash.
Syn is another alternative.

Botonomous · Answer

After a bit of research I found a solution:

I am now using https://github.com/bitwalker/swarm to handle the pid registration. This allows setting up processes across a cluster and offers hand off support if one of the nodes goes down.

chris · Answer

You can use a simple central node to monitor other nodes, of course, one supervisor.

This central node only does start, monitor, and use a database to save other nodes' status and pid.

When a node joins and downs, you can receive its down message and handle it(update database).

The only drawback of this method is that you can only have one central node, but as this node does simple things, it is almost stable, which runs on our production system for a year.

Global Dynamic Supervisor in a cluster

Tags:

cluster-analysis

elixir

Botonomous

3 Answers

Milan Jaric

Botonomous

chris

Recent Activity

Donate For Us

Global Dynamic Supervisor in a cluster

Tags:

cluster-analysis

elixir

Botonomous

3 Answers

Milan Jaric

Botonomous

chris

Related questions

Recent Activity

Donate For Us