Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang supervisor dynamic change to restart intensity

My question is, can one modify the restart intensity thresholds of an already running supervisor, apart from in a release upgrade scenario, and if so, how?

It's never come up before, but running a supervisor with initially no children, so that another process starts children by way of supervisor:start_child/2, so my sup init/1 being like this:

init([]) ->
    RestartSt = {simple_one_for_one, 10, 10},
    ChSpec = [foo, {foo,start_link,[]}, transient, 1000, worker, [foo]}],
    {ok, {RestartSt, ChSpec}}.

At the time of supervisor start, the likely number of children is unknown; certainly it could vary dramatically from 10, to 10,000, or more.

A restart intensity of say 20 is generous enough for 10 children, but for say 10,000 children I would like to be able to increase it... and decrease it as the number of children drops due to normal terminations.

like image 249
Michael Avatar asked Oct 18 '15 22:10

Michael


1 Answers

There's no API for doing this, so I believe you're stuck with the upgrade approach unless you want to propose a new API for this to the OTP team by submitting a pull request providing a complete patch with code changes, new tests, and documentation changes.

There's also a really dirty hack way of doing this that involves manipulating internal supervisor state, and so it's absolutely not something I would recommend for a production system but I think it's still interesting to look at. A supervisor stores restart intensity in its internal loop state. You can see this state by calling sys:get_state/1,2 on a supervisor process. For example, here's the state of a supervisor in the Yaws web server:

1> rr(supervisor).
[child,state]
2> sys:get_state(yaws_sup).
#state{name = {local,yaws_sup},
       strategy = one_for_all,
       children = [#child{pid = <0.67.0>,name = yaws_sup_restarts,
                          mfargs = {yaws_sup_restarts,start_link,[]},
                          restart_type = transient,shutdown = infinity,
                          child_type = supervisor,
                          modules = [yaws_sup_restarts]},
                   #child{pid = <0.42.0>,name = yaws_server,
                          mfargs = {yaws_server,start_link,
                                                [{env,true,false,false,false,false,false,"default"}]},
                          restart_type = permanent,shutdown = 120000,
                          child_type = worker,
                          modules = [yaws_server]},
                   #child{pid = <0.39.0>,name = yaws_trace,
                          mfargs = {yaws_trace,start_link,[]},
                          restart_type = permanent,shutdown = 5000,
                          child_type = worker,
                          modules = [yaws_trace]},
                   #child{pid = <0.36.0>,name = yaws_log,
                          mfargs = {yaws_log,start_link,[]},
                          restart_type = permanent,shutdown = 5000,
                          child_type = worker,
                          modules = [yaws_log]}],
       dynamics = undefined,intensity = 0,period = 1,restarts = [],
       module = yaws_sup,args = []}

The initial rr command retrieves the record definitions from supervisor so we can see the field names when we get the state from yaws_sup, otherwise we would just get a tuple full of anonymous values.

The retrieved state shows the intensity in this case to be 0. We can change it using sys:replace_state/2,3:

3> sys:replace_state(yaws_sup, fun(S) -> S#state{intensity=2} end).
#state{name = {local,yaws_sup},
       strategy = one_for_all,
       children = [#child{pid = <0.67.0>,name = yaws_sup_restarts,
                          mfargs = {yaws_sup_restarts,start_link,[]},
                          restart_type = transient,shutdown = infinity,
                          child_type = supervisor,
                          modules = [yaws_sup_restarts]},
                   #child{pid = <0.42.0>,name = yaws_server,
                          mfargs = {yaws_server,start_link,
                                                [{env,true,false,false,false,false,false,"default"}]},
                          restart_type = permanent,shutdown = 120000,
                          child_type = worker,
                          modules = [yaws_server]},
                   #child{pid = <0.39.0>,name = yaws_trace,
                          mfargs = {yaws_trace,start_link,[]},
                          restart_type = permanent,shutdown = 5000,
                          child_type = worker,
                          modules = [yaws_trace]},
                   #child{pid = <0.36.0>,name = yaws_log,
                          mfargs = {yaws_log,start_link,[]},
                          restart_type = permanent,shutdown = 5000,
                          child_type = worker,
                          modules = [yaws_log]}],
       dynamics = undefined,intensity = 2,period = 1,restarts = [],
       module = yaws_sup,args = []}

Our second argument to sys:replace_state/2 takes a state record as an argument and changes its intensity field to 2. The sys:replace_state/2,3 functions return the new state, and as you can see near the end of the result here, intensity is now 2 instead of 0.

As the sys:replace_state/2,3 documentation explains, these functions are intended only for debugging purposes, so using them to do this in a production system is definitely not something I recommend. The second argument to replace_state here shows that this approach requires knowledge of the details of the internal state record of supervisor, which we obtained here via the rr shell command, so if that record ever changes, this code may stop working. Even more fragile would be treating the supervisor state record as a tuple and counting on the intensity field to be in a particular tuple position so you can change its value. Therefore, if you really want this functionality of changing a supervisor's restart intensity, you're best off in the long run proposing to the OTP team that it be added; if you're going to take that route, I recommend first proposing the idea on the erlang-questions mailing list to gauge interest.

like image 189
Steve Vinoski Avatar answered Oct 21 '22 23:10

Steve Vinoski