In Erlang, when a process's mailbox growth bigger, it runs slower, why?

Question

Here is the example:test_for_gen_server.erl

When a process got 10000 messages in its mailbox, it took 0.043 sec to finish. When the number is 50000, it should take 0.215 sec, but the reality is 2.4 sec, 10 times slower. Why?

Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:true]

Eshell V7.1 (abort with ^G)

1> test_for_gen_server:start_link().

{ok,<0.36.0>}

2> test_for_gen_server:test(10000).

ok

======gen_server: Times:10000 Cost:42863

3> test_for_gen_server:test(10000).

ok

======gen_server: Times:10000 Cost:43096

4> test_for_gen_server:test(10000).

ok

======gen_server: Times:10000 Cost:43223

5> test_for_gen_server:test(50000).

ok

======gen_server: Times:50000 Cost:2504395

6> test_for_gen_server:test(50000).

ok

======gen_server: Times:50000 Cost:2361987

7> test_for_gen_server:test(50000).

ok

======gen_server: Times:50000 Cost:2304715

Greg · Accepted Answer

Following the comment, in this case indeed it's not caused by the size of the mailbox, because in the gen_server the mailbox message always matches. See the receive loop.

It turns out that in this case the slower execution is due to additional complexity of the code, and in particular, to multiple allocations of small amounts of data that then need to be freed by the garbage collector (so is rather unrelated to the size of the mailbox but to how many times the code gets executed).

Below is a slightly modified version of your code, the main difference is that the message queue is filled up after receiving the start message. In addition to your example there are 7 other variations, each one slightly modified/simplified version of your loop. The second loop is based on the flow that you can find the gen_server code.

-module (test_for_gen_server).

-behaviour (gen_server).

%% APIs
-export([test1/1, test2/1, test3/1, test4/1, test5/1, test6/1, test7/1,
         test8/1, test9/1]).

%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
         terminate/2, code_change/3]).

test1(N) ->
    {ok, Pid} = gen_server:start_link(?MODULE, [], []),
    Pid ! {start, N}.

test2(N) -> Pid = spawn(fun() -> loop2([undefined, 0]) end), Pid ! {start, N}.
test3(N) -> Pid = spawn(fun() -> loop3([undefined, 0]) end), Pid ! {start, N}.
test4(N) -> Pid = spawn(fun() -> loop4([undefined, 0]) end), Pid ! {start, N}.
test5(N) -> Pid = spawn(fun() -> loop5([undefined, 0]) end), Pid ! {start, N}.
test6(N) -> Pid = spawn(fun() -> loop6([undefined, 0]) end), Pid ! {start, N}.
test7(N) -> Pid = spawn(fun() -> loop7([undefined, 0]) end), Pid ! {start, N}.
test8(N) -> Pid = spawn(fun() -> loop8(undefined, 0) end), Pid ! {start, N}.
test9(N) -> Pid = spawn(fun() -> loop9({undefined, 0}) end), Pid ! {start, N}.

%%==============================================================================

init([]) ->
    {ok, []}.
handle_call(_Request, _From, State) ->
    {reply, nomatch, State}.
handle_cast(_Msg, State) ->
    {noreply, State}.

handle_info({start, N}, _State) ->
    do_test(N),
    {A,B,C} = os:timestamp(),
    Timestamp = (A * 1000000 + B) * 1000000 + C,
    {noreply, [Timestamp, 0]};
handle_info(stop, [Timestamp, Times]) ->
    {A,B,C} = os:timestamp(),
    Timestamp1 = (A * 1000000 + B) * 1000000 + C,
    Cost = Timestamp1 - Timestamp,
    io:format("======gen_server:  Times:~p Cost:~p~n", [Times, Cost]),
    {stop, normal, []};
handle_info(_Info, [Timestamp, Times]) ->
    {noreply, [Timestamp, Times + 1]}.

terminate(_Reason, _State) -> ok.

code_change(_OldVer, State, _Extra) -> {ok, State}.

do_test(0) -> self() ! stop;
do_test(N) -> self() ! a, do_test(N - 1).

%%==============================================================================

loop2(State) ->
    Msg = receive
              Input -> Input
          end,
    Reply = {ok, handle_info(Msg, State)},
    handle_common_reply(Reply, Msg, State).

handle_common_reply(Reply, _Msg, _State) ->
    case Reply of
        {ok, {noreply, NState}} -> loop2(NState);
        {ok, {stop, normal, _}} -> ok
    end.

%%==============================================================================

loop3(State) ->
    Msg = receive
              Input -> Input
          end,
    Reply = {ok, handle_info(Msg, State)},
    case Reply of
        {ok, {noreply, NState}} -> loop3(NState);
        {ok, {stop, normal, _}} -> ok
    end.

%%==============================================================================

loop4(State) ->
    Msg = receive
              Input -> Input
          end,
    case handle_info(Msg, State) of
        {noreply, NState} -> loop4(NState);
        {stop, normal, _} -> ok
    end.

%%==============================================================================

loop5(State) ->
    receive
        Input ->
            case handle_info(Input, State) of
                {noreply, NState} -> loop5(NState);
                {stop, normal, _} -> ok
            end
    end.

%%==============================================================================

loop6(State) ->
    receive
        {start, _N} = Msg ->
            {noreply, NState} = handle_info(Msg, State),
            loop6(NState);
        stop = Msg ->
            {stop, normal, []} = handle_info(Msg, State);
        Info ->
            {noreply, NState} = handle_info(Info, State),
            loop6(NState)
    end.

%%==============================================================================

loop7([Timestamp, Times]) ->
    receive
        {start, N} ->
            do_test(N),
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            loop7([NTimestamp, 0]);
        stop ->
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            Cost = NTimestamp - Timestamp,
            io:format("======Times:~p Cost:~p~n", [Times, Cost]);
        _Info ->
            loop7([Timestamp, Times + 1])
    end.

%%==============================================================================

loop8(Timestamp, Times) ->
    receive
        {start, N} ->
            do_test(N),
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            loop8(NTimestamp, 0);
        stop ->
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            Cost = NTimestamp - Timestamp,
            io:format("======Times:~p Cost:~p~n", [Times, Cost]);
        _Info ->
            loop8(Timestamp, Times + 1)
    end.

%%==============================================================================

loop9({Timestamp, Times}) ->
    receive
        {start, N} ->
            do_test(N),
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            loop9({NTimestamp, 0});
        stop ->
            {A,B,C} = os:timestamp(),
            NTimestamp = (A * 1000000 + B) * 1000000 + C,
            Cost = NTimestamp - Timestamp,
            io:format("======Times:~p Cost:~p~n", [Times, Cost]);
        _Info ->
            loop9({Timestamp, Times + 1})
    end.

Result:

28> c(test_for_gen_server).          
{ok,test_for_gen_server}
29> test_for_gen_server:test1(50000).
{start,50000}
======gen_server:  Times:50000 Cost:2285054

30> test_for_gen_server:test2(50000).
{start,50000}
======gen_server:  Times:50000 Cost:2170294

31> test_for_gen_server:test3(50000).
{start,50000}
======gen_server:  Times:50000 Cost:1520796

32> test_for_gen_server:test4(50000).
{start,50000}
======gen_server:  Times:50000 Cost:1526084

33> test_for_gen_server:test5(50000).
{start,50000}
======gen_server:  Times:50000 Cost:1510738

34> test_for_gen_server:test6(50000).
{start,50000}
======gen_server:  Times:50000 Cost:1496024

35> test_for_gen_server:test7(50000).
{start,50000}
======Times:50000 Cost:863876

36> test_for_gen_server:test8(50000).
{start,50000}
======Times:50000 Cost:5830

47> test_for_gen_server:test9(50000).
{start,50000}
======Times:50000 Cost:640157

You can see how the execution time gets smaller and smaller with every change. Notice the difference between test2 and test3, where the only difference in the code is the additional function call. And especially pay attention to the dramatic difference between test7 and test8, where the only difference in the code is the additional creation and destruction of a two-element list with each execution of the loop in case of test7.

The last loop can be executed without allocating anything on the stack, using only the VM virtual registers, so will be the fastest. The other loops always allocate some data on the stack, which then has to be periodically freed by the garbage collector.

Note

Just added test9 to show that using tuples instead of lists when passing arguments between functions generally gives a better performance.

Previous answer left for reference

This is because the receive clause needs to match incoming messages with a pattern that can occur in that clause. It takes each message from the mailbox and tries to match it against the pattern. The first one that matches is processed.

So, if the queue builds up because the messages don't match it takes longer and longer to process each new incoming message (because the matching always start from the first message in the queue).

Therefore it's a good practice to always flush unknown messages in gen servers, as suggested by Joe Armstrong in his Phd thesis (section 5.8).

This article explains it with more details: Erlang explained: Selective receive, and it's also explained in section 3.4 of Joe Armstrong thesis mentioned earlier.

蛋汤饭 · Answer

Recently, I've got the same problem in Elixir, Finally I find the answer in this paper

In section 3. The memory architecture of erlang is process-centric. Each process allocates and manages its own memory area which typically includes a PCB, private stack, and private heap.

This cause the disadvantage: High memory fragmentation

A process cannot utilize the memory (e.g., the heap) of another process even if there are large amounts of unused space in that memory area. This typically implies that processes can allocate only a small amount of memory by default. This in turn usually results in a larger number of calls to the garbage collector.

So large size of mailbox can cause the whole system slow.

sizif · Answer

Adding to 蛋汤饭's highly useful reply:

as of Erlang/OTP 20 one can use process flag message_queue_data = off_heap to avoid GC slowdown on large mailboxes. See the documentation for process_flag(Flag :: message_queue_data, MQD) here. See an example in Erlang standard library:

init(_) ->
    %% The error logger process may receive a huge amount of
    %% messages. Make sure that they are stored off heap to
    %% avoid exessive GCs.
    process_flag(message_queue_data, off_heap),
    {ok, []}.

Alternatively, you can use {message_queue_data, off_heap} option when creating a process with erlang:spawn_opt/2.

However in my experience, the off_heap option does not help much. What helps is a custom flow control scheme: let the sender process check queue length of the receiver process with process_info(ReceiverPid, message_queue_len) and throttle if the queue length exceeds 1000. Check periodically, not on every send to avoid killing performance. See also here.

In Erlang, when a process's mailbox growth bigger, it runs slower, why?

Tags:

erlang

HooHoo Diddy

3 Answers

Greg

蛋汤饭

sizif

Recent Activity

Donate For Us

In Erlang, when a process's mailbox growth bigger, it runs slower, why?

Tags:

erlang

HooHoo Diddy

3 Answers

Greg

蛋汤饭

sizif

Related questions

Recent Activity

Donate For Us