Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Erlang: is two stage init safe?

When using gen_server, sometimes I need to do a "two stage init" or "detached init", which goes like this:

a) in the gen_server callback module's init/1, only part of the initialization is done

b) after that, a self() ! init_stage2 is called

c) init/1 returns {ok, PartiallyInitializedState}

d) at some point of future, the handle_info/2 is called to process the init_stage2 message sent in b), thus complete the initilization process.

My main concern is that, if a gen server call / cast / info is made between c) and d), is it possible for that request to be processed with the PartiallyInitializedState?

According to 10.8 Is the order of message reception guaranteed?, (quoted, below), this is possible, (if I understand it right), but I cannot produce a failure (a request between c) and d) is processed with the partially initilized state)

Yes, but only within one process.

If there is a live process and you send it message A and then message B, it's guaranteed that if message B arrived, message A arrived before it.

On the other hand, imagine processes P, Q and R. P sends message A to Q, and then message B to R. There is no guarantee that A arrives before B. (Distributed Erlang would have a pretty tough time if this was required!)

Below is some code I used trying to get the call to be processed between c) and d), but of course failed, otherwise, I wouldn't be asking this question here. (use test:start(20000) to run, if you are interested)

%% file need_two_stage_init.erl
-module(need_two_stage_init).

-behaviour(gen_server).

-export([start_link/0]).

-export([init/1, terminate/2, code_change/3,
         handle_call/3, handle_cast/2, handle_info/2]).


start_link() ->
    gen_server:start_link(?MODULE, {}, []).


init({}) ->
    self() ! go_to_stage2,
    %% init into stage1
    {ok, stage1}.

handle_call(_Request, _From, Stage) ->
    {reply, Stage, Stage}.

%% upon receiving this directive, go to stage2,
%% in which the gen_server is fully functional
handle_info(go_to_stage2, stage1) ->
    {noreply, stage2}.

handle_cast(Request, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ignore.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.



%% file test.erl
-module(test).

-export([start/1]).

start(Amount) ->
    start_collector(Amount), %% report the result
    start_many_gens(Amount).

start_collector(Amount) ->
    spawn(fun() ->
                  register(collector, self()),
                  io:format("collector started, entering receive loop~n"),
                  loop(Amount)
          end).

loop(0) ->
    io:format("all terminated~n"),
    all_terminated;
loop(N) ->
    %% progress report
    case N rem 5000 == 0 of
        true -> io:format("remaining ~w~n", [N]);
        false -> ignore
    end,
    receive
        {not_ok, _} = Msg ->
            io:format("======= bad things happened: ~p~n", [Msg]),
            loop(N-1);
        {ok, _} ->
            loop(N-1)
    end.


start_one_gens() ->
    {ok, Pid} = need_two_stage_init:start_link(),
    case gen_server:call(Pid, any) of
        stage2 -> ignore;
        stage1 -> collector ! {not_ok, Pid}
    end,
    gen_server:stop(Pid),
    collector ! {ok, Pid}.


start_many_gens(Amount) ->
    lists:foreach(fun(_) ->
                          spawn(fun start_one_gens/0)
                  end, lists:seq(1, Amount)).

Edit Reading the above quoted documentation again, I think I did misunderstand it, "If there is a live process and you send it message A and then message B, it's guaranteed that if message B arrived, message A arrived before it." It didn't say who sent A, and who sent B, I guess it means that it doesn't matter, as long as they were sent to the same process, in which case, the two stage init practice is safe. Anyway, it would be nice if some Erlang/OTP guru can clarify this.

(off the topic, saying "Erlang/OTP" feels like those GNU guys forces you to say "GNU Linux" :-)

Edit 2 Thanks to @Dogbert, the short version of this question can be stated in these two ways:

1) if a process send a message to itself, is this message guaranteed to reach the mailbox synchronously?

2) or, let A, B and P be three different processes, A send MsgA to P first, then B send MsgB to P, is it guaranteed that MsgA arrives before MsgB?

like image 367
Not an ID Avatar asked Sep 26 '16 08:09

Not an ID


1 Answers

In your case gen_server:start_link/3 will not returns until your need_two_stage_init:init/1 returns. So either need_two_stage_init:start_link/0. It means there is already go_to_stage2 in your mailbox. So when you are not using registered name there is not anybody knowing your Pid except your process calling gen_server:start_link/3 but it is hidden there until return anyway. So you are safe because nobody can call, cast or send you a message not knowing Pid.

BTW you can achieve similar effect returning {ok, PartiallyInitializedState, 0} and then handle timeout in hanle_info/2.

(off topic, there is history behind GNU in Linux when Linux was work of Linus and small community around him and GNU was already established huge project with lot of userspace applications so they have good reason to be mentioned in name of OS which consist lot of their work. Erlang is language and OTP is distribution of utilities and modules but its both work of the same group of people so they probably forgive you.)

ad 1) No, it is not guaranteed, it's a way how it is currently implemented and it is unlikely to change in foreseen future because it is simple and robust. When a process sends a message to a process in the same VM it copies message term to the separated heap/environment and then atomically appends a message to the linked list of a message box. I'm not sure if the message is copied if the process sends the message to itself. There is shared heap implementation which doesn't copy message but none of those details change the fact, that message is linked to the receiver's message box before the process continues its work.

ad 2) First of all, how you know B sends a message after A sends a message? Think about it. Then we could talk about MasgA and MsgB. No there is not guaranteed that MsgA arrives before MsgB especially if A, B, and P are each on different VM especially different computers. The only way to guarantee B sends the message MsgB after A sends MsgA is to send MsgC from A after A sends MsgA to the P but even if B sends MsgB to the P after receives MsgC, there is not guaranteed that the P receives MsgA before MsgB. So in scenario A sends MsgA to P and then MsgC to B and B receives MsgC and then sends MsgB to P you know MsgA was sent before MsgB but P still in rare circumstances could receive MsgB before MsgA when A, B, and P are on different computers connected by a network. It should never happen when A, B, and P are on same VM due way how message sending is implemented.

like image 144
Hynek -Pichi- Vychodil Avatar answered Oct 20 '22 03:10

Hynek -Pichi- Vychodil