Here is the example:test_for_gen_server.erl
When a process got 10000 messages in its mailbox, it took 0.043 sec to finish. When the number is 50000, it should take 0.215 sec, but the reality is 2.4 sec, 10 times slower. Why?
Erlang/OTP 18 [erts-7.1] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:true]
Eshell V7.1 (abort with ^G)
1> test_for_gen_server:start_link().
{ok,<0.36.0>}
2> test_for_gen_server:test(10000).
ok
======gen_server: Times:10000 Cost:42863
3> test_for_gen_server:test(10000).
ok
======gen_server: Times:10000 Cost:43096
4> test_for_gen_server:test(10000).
ok
======gen_server: Times:10000 Cost:43223
5> test_for_gen_server:test(50000).
ok
======gen_server: Times:50000 Cost:2504395
6> test_for_gen_server:test(50000).
ok
======gen_server: Times:50000 Cost:2361987
7> test_for_gen_server:test(50000).
ok
======gen_server: Times:50000 Cost:2304715
Following the comment, in this case indeed it's not caused by the size of the mailbox, because in the gen_server
the mailbox message always matches. See the receive
loop.
It turns out that in this case the slower execution is due to additional complexity of the code, and in particular, to multiple allocations of small amounts of data that then need to be freed by the garbage collector (so is rather unrelated to the size of the mailbox but to how many times the code gets executed).
Below is a slightly modified version of your code, the main difference is that the message queue is filled up after receiving the start
message. In addition to your example there are 7 other variations, each one slightly modified/simplified version of your loop. The second loop is based on the flow that you can find the gen_server
code.
-module (test_for_gen_server).
-behaviour (gen_server).
%% APIs
-export([test1/1, test2/1, test3/1, test4/1, test5/1, test6/1, test7/1,
test8/1, test9/1]).
%% gen_server callbacks
-export([init/1, handle_call/3, handle_cast/2, handle_info/2,
terminate/2, code_change/3]).
test1(N) ->
{ok, Pid} = gen_server:start_link(?MODULE, [], []),
Pid ! {start, N}.
test2(N) -> Pid = spawn(fun() -> loop2([undefined, 0]) end), Pid ! {start, N}.
test3(N) -> Pid = spawn(fun() -> loop3([undefined, 0]) end), Pid ! {start, N}.
test4(N) -> Pid = spawn(fun() -> loop4([undefined, 0]) end), Pid ! {start, N}.
test5(N) -> Pid = spawn(fun() -> loop5([undefined, 0]) end), Pid ! {start, N}.
test6(N) -> Pid = spawn(fun() -> loop6([undefined, 0]) end), Pid ! {start, N}.
test7(N) -> Pid = spawn(fun() -> loop7([undefined, 0]) end), Pid ! {start, N}.
test8(N) -> Pid = spawn(fun() -> loop8(undefined, 0) end), Pid ! {start, N}.
test9(N) -> Pid = spawn(fun() -> loop9({undefined, 0}) end), Pid ! {start, N}.
%%==============================================================================
init([]) ->
{ok, []}.
handle_call(_Request, _From, State) ->
{reply, nomatch, State}.
handle_cast(_Msg, State) ->
{noreply, State}.
handle_info({start, N}, _State) ->
do_test(N),
{A,B,C} = os:timestamp(),
Timestamp = (A * 1000000 + B) * 1000000 + C,
{noreply, [Timestamp, 0]};
handle_info(stop, [Timestamp, Times]) ->
{A,B,C} = os:timestamp(),
Timestamp1 = (A * 1000000 + B) * 1000000 + C,
Cost = Timestamp1 - Timestamp,
io:format("======gen_server: Times:~p Cost:~p~n", [Times, Cost]),
{stop, normal, []};
handle_info(_Info, [Timestamp, Times]) ->
{noreply, [Timestamp, Times + 1]}.
terminate(_Reason, _State) -> ok.
code_change(_OldVer, State, _Extra) -> {ok, State}.
do_test(0) -> self() ! stop;
do_test(N) -> self() ! a, do_test(N - 1).
%%==============================================================================
loop2(State) ->
Msg = receive
Input -> Input
end,
Reply = {ok, handle_info(Msg, State)},
handle_common_reply(Reply, Msg, State).
handle_common_reply(Reply, _Msg, _State) ->
case Reply of
{ok, {noreply, NState}} -> loop2(NState);
{ok, {stop, normal, _}} -> ok
end.
%%==============================================================================
loop3(State) ->
Msg = receive
Input -> Input
end,
Reply = {ok, handle_info(Msg, State)},
case Reply of
{ok, {noreply, NState}} -> loop3(NState);
{ok, {stop, normal, _}} -> ok
end.
%%==============================================================================
loop4(State) ->
Msg = receive
Input -> Input
end,
case handle_info(Msg, State) of
{noreply, NState} -> loop4(NState);
{stop, normal, _} -> ok
end.
%%==============================================================================
loop5(State) ->
receive
Input ->
case handle_info(Input, State) of
{noreply, NState} -> loop5(NState);
{stop, normal, _} -> ok
end
end.
%%==============================================================================
loop6(State) ->
receive
{start, _N} = Msg ->
{noreply, NState} = handle_info(Msg, State),
loop6(NState);
stop = Msg ->
{stop, normal, []} = handle_info(Msg, State);
Info ->
{noreply, NState} = handle_info(Info, State),
loop6(NState)
end.
%%==============================================================================
loop7([Timestamp, Times]) ->
receive
{start, N} ->
do_test(N),
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
loop7([NTimestamp, 0]);
stop ->
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
Cost = NTimestamp - Timestamp,
io:format("======Times:~p Cost:~p~n", [Times, Cost]);
_Info ->
loop7([Timestamp, Times + 1])
end.
%%==============================================================================
loop8(Timestamp, Times) ->
receive
{start, N} ->
do_test(N),
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
loop8(NTimestamp, 0);
stop ->
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
Cost = NTimestamp - Timestamp,
io:format("======Times:~p Cost:~p~n", [Times, Cost]);
_Info ->
loop8(Timestamp, Times + 1)
end.
%%==============================================================================
loop9({Timestamp, Times}) ->
receive
{start, N} ->
do_test(N),
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
loop9({NTimestamp, 0});
stop ->
{A,B,C} = os:timestamp(),
NTimestamp = (A * 1000000 + B) * 1000000 + C,
Cost = NTimestamp - Timestamp,
io:format("======Times:~p Cost:~p~n", [Times, Cost]);
_Info ->
loop9({Timestamp, Times + 1})
end.
Result:
28> c(test_for_gen_server).
{ok,test_for_gen_server}
29> test_for_gen_server:test1(50000).
{start,50000}
======gen_server: Times:50000 Cost:2285054
30> test_for_gen_server:test2(50000).
{start,50000}
======gen_server: Times:50000 Cost:2170294
31> test_for_gen_server:test3(50000).
{start,50000}
======gen_server: Times:50000 Cost:1520796
32> test_for_gen_server:test4(50000).
{start,50000}
======gen_server: Times:50000 Cost:1526084
33> test_for_gen_server:test5(50000).
{start,50000}
======gen_server: Times:50000 Cost:1510738
34> test_for_gen_server:test6(50000).
{start,50000}
======gen_server: Times:50000 Cost:1496024
35> test_for_gen_server:test7(50000).
{start,50000}
======Times:50000 Cost:863876
36> test_for_gen_server:test8(50000).
{start,50000}
======Times:50000 Cost:5830
47> test_for_gen_server:test9(50000).
{start,50000}
======Times:50000 Cost:640157
You can see how the execution time gets smaller and smaller with every change. Notice the difference between test2
and test3
, where the only difference in the code is the additional function call. And especially pay attention to the dramatic difference between test7
and test8
, where the only difference in the code is the additional creation and destruction of a two-element list with each execution of the loop in case of test7
.
The last loop can be executed without allocating anything on the stack, using only the VM virtual registers, so will be the fastest. The other loops always allocate some data on the stack, which then has to be periodically freed by the garbage collector.
Note
Just added test9
to show that using tuples instead of lists when passing arguments between functions generally gives a better performance.
Previous answer left for reference
This is because the receive
clause needs to match incoming messages with a pattern that can occur in that clause. It takes each message from the mailbox and tries to match it against the pattern. The first one that matches is processed.
So, if the queue builds up because the messages don't match it takes longer and longer to process each new incoming message (because the matching always start from the first message in the queue).
Therefore it's a good practice to always flush unknown messages in gen servers, as suggested by Joe Armstrong in his Phd thesis (section 5.8).
This article explains it with more details: Erlang explained: Selective receive, and it's also explained in section 3.4 of Joe Armstrong thesis mentioned earlier.
Recently, I've got the same problem in Elixir, Finally I find the answer in this paper
In section 3. The memory architecture of erlang is process-centric. Each process allocates and manages its own memory area which typically includes a PCB, private stack, and private heap.
This cause the disadvantage: High memory fragmentation
A process cannot utilize the memory (e.g., the heap) of another process even if there are large amounts of unused space in that memory area. This typically implies that processes can allocate only a small amount of memory by default. This in turn usually results in a larger number of calls to the garbage collector.
So large size of mailbox can cause the whole system slow.
Adding to 蛋汤饭's highly useful reply:
as of Erlang/OTP 20 one can use process flag message_queue_data = off_heap
to avoid GC slowdown on large mailboxes. See the documentation for process_flag(Flag :: message_queue_data, MQD)
here. See an example in Erlang standard library:
init(_) ->
%% The error logger process may receive a huge amount of
%% messages. Make sure that they are stored off heap to
%% avoid exessive GCs.
process_flag(message_queue_data, off_heap),
{ok, []}.
Alternatively, you can use {message_queue_data, off_heap} option when creating a process with erlang:spawn_opt/2.
However in my experience, the off_heap option does not help much. What helps is a custom flow control scheme: let the sender process check queue length of the receiver process with process_info(ReceiverPid, message_queue_len)
and throttle if the queue length exceeds 1000. Check periodically, not on every send to avoid killing performance. See also here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With