Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to trigger elixir supervisor tree termination from a supervised worker process

I am trying to terminate whole supervision tree from a supervised worker process. Here is my supervision tree:

                   +--------------------------+
                   |                          |
          +--------+ Sup1: Dynamic Supervisor +---------+
          |        |                          |         |
          |        +-------------+------------+         |
          |                      |                      |
          |                      |                      |
          v                      v                      v

+------------------+   +------------------+  +------------------+
|                  |   |                  |  |                  |
| Job1: Supervisor |   | Job2: Supervisor |  | Job3: Supervisor |
|                  |   |                  |  |                  |
+------------------+   +-+-------- +---+--+  +------------------+
                         |             |
                         |             |
                         |             |
                         |             |
                         v             v

             +-------------------+  +--------------+
             |                   |  |              |
             | Progress Monitor: |  | Work: Worker |
             |       Worker      |  |              |
             |                   |  +--------------+
             +-------------------+

Process life cycle:

  1. A Job is started via: DynamicSupervisor.start_child(__MODULE__, spec)
  2. Each job is a supervision tree as well: 1 supervisor (restart strategy - one_for_one) -> 2 workers
  3. Progress Monitor worker knows when the given job is done
  4. On job done, Progress Monitor worker makes an attempt to terminate the whole job supervision tree, by calling: DynamicSupervisor.terminate_child(__MODULE__, pid)
  5. Progress Monitor is expected to do cleanup steps in terminate callback - it is trapping exit signals

Problems and observations:

  1. DynamicSupervisor.terminate_child is a blocking call, which means it waits for all child processes to terminate as well, including the calling process - Progress Monitor
  2. Progress Monitor is in a deadlock and can not terminate. Parent supervisor sends :kill signal, which does not trigger terminate callback

Quick workarounds:

  1. Call DynamicSupervisor.terminate_child from Progress Monitor worker asynchronously:

    spawn(fn -> DynamicSupervisor.terminate_child(__MODULE__, pid) end)

  2. Define shutdown strategy for Sup1: Dynamic Supervisor:

    shutdown: 5_000

    It will wait at most 5s for a job supervision tree termination and then it will send shutdown exit signal. This will ensure terminate callback being called for Progress Monitor process.

Not happy with both of them.

Questions:

  1. How to trigger supervision tree termination from a worker process and avoid deadlocks?
  2. If terminating supervision tree from a worker is not the best practice, what is the recommended way then?
  3. Any recommendations how to redesign supervision tree to make graceful termination easier?
like image 687
mkorszun Avatar asked Nov 14 '18 10:11

mkorszun


1 Answers

Just call it in async task Task.async(fn -> Process.exit(Sup1, :shutdown) end) it will terminate Sup1 and with it all children will shutdown

EDIT:

If you need prettier solution, it depends what elese you need. In most cases, I create Bootstrapper worker that will do initialization and some other stuff. You could add easily other features.

So considering above, and just roughly speaking, I would add in a layer above (AppSupervisor), Another DynamicSupervisor so it can start Bootstrapper and pass self() to it (or register it under local name to avoid this injection). After that, on start, Bootstrap worker will start Sup1 (your dynamic supervisor) and await for other messages, e.g. :terminate_sup1 that will shutdown Sup1 process. Later, in some of below workers you can shutdown Sup1 by casting :terminate_sup1 message to bootstraper. Also there is a door that allow you to start again Sup1 when another message is sent to bootstrap worker.

Further more, if you just need to shutdown Sup1, just go with Task. But if you need control, then put it into single worker process that should have control over it, when it is up or down.

like image 65
Milan Jaric Avatar answered Sep 30 '22 08:09

Milan Jaric