Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange error message when stopping app using lager and poolboy

Tags:

erlang

I've created simple application using poolboy with almost empty worker, but when I stop application, I see the following error printed by lager:

10:50:26.363 [error] Supervisor {<0.236.0>,poolboy_sup} had child test_worker started with test_worker:start_link([]) at undefined exit with reason shutdown in context shutdown_error

What causes this error and how can I fix this?

Supervisor:

-module(test_sup).
-behaviour(supervisor).
-export([start_link/0, init/1]).


start_link() ->
    supervisor:start_link({local, ?MODULE}, ?MODULE, []).

init([]) ->
    ChildSpecs = [pool_spec()],
    {ok, {{one_for_one, 1000, 3600}, ChildSpecs}}.

pool_spec() ->
    Name = test_pool,
    PoolArgs = [{name, {local, Name}},
                {worker_module, test_worker},
                {size, 10},
                {max_overflow, 20}],
    poolboy:child_spec(Name, PoolArgs, []).

Worker:

-module(test_worker).
-behaviour(gen_server).
-behaviour(poolboy_worker).

-export([start_link/1]).
-export([init/1, handle_call/3, handle_cast/2,
     handle_info/2, terminate/2, code_change/3]).

-record(state, {}).

start_link([]) ->
    gen_server:start_link(?MODULE, [], []).

init([]) ->
    {ok, #state{}}.

handle_call(_Request, _From, State) ->
    {reply, _Reply = ok, State}.

handle_cast(_Msg, State) ->
    {noreply, State}.

handle_info(_Info, State) ->
    {noreply, State}.

terminate(_Reason, _State) ->
    ok.

code_change(_OldVsn, State, _Extra) ->
    {ok, State}.

The rest of the application is pretty standard.

Erlang: R16B02

Poolboy: 1.0.1

Lager: latest version from master at the time of writing the question (822062478a223313dce30e5a45e30a50a4b7dc4e)

like image 707
Fred K Avatar asked Oct 14 '13 06:10

Fred K


2 Answers

The error you are seeing is actually not an error but an error report generated by lager. This report seems to be caused by a bug in poolboy.

You can either:

  • Fix the bug and submit a patch to poolboy developers.
  • Safely ignore the report.
  • Manually terminate your workers on exit.

What is supposed to happen when you stop an OTP application is that the supervision tree is used to terminate all processes, preferably gracefully. The default way to do it is to send supervised processes a shutdown signal, and if this doesn't work after a while, to brutally kill them. You never get any report when everything goes smoothly.

There are two Erlang subtleties to understand the bug:

  1. Processes can be linked, which means that when one process terminates abnormally (i.e. with a reason other than normal), all linked processes are terminated with the same reason. This primitive is the basis of OTP supervision.
  2. A process can trap exit signals (or trap exits), which means it receives exit signals as regular messages instead of being terminated (including normal which wouldn't terminate it, but excluding kill which will terminate it unconditionally).

Links combined with trapping exits are often used to monitor termination of processes, with the additional benefit of terminating monitored processes when the monitoring process terminates. For example, if a supervisor terminates, its children shall be terminated. An asymmetrical monitor mechanism also exists.

Here, your supervisor (implementing test_sup behavior) is terminated with the reason shutdown, as it should be. The supervisor behavior actually traps exits and when it receives the shutdown signal, it tries to terminate its children according to their shutdown strategy. Here, you use the default strategy, which is to send children the shutdown signal as a first attempt. So your supervisor sends the shutdown signal to its only child.

Poolboy introduces its magic here, and the child of your supervisor is actually a gen_server with poolboy callback module. It should shut down the pool and terminate gracefully.

This module is linked to the pool supervisor, but also to the workers. This surprising implementation choice is probably that a crash of the pool (the poolboy gen_server) shall terminate the workers. However, this is the source of the bug, and an asymmetric monitor would probably make more sense. Since the supervisor is already linked to the poolboy gen_server, a termination of the poolboy process will eventually lead to a termination of the workers anyway.

The consequence of linking to the workers is that they also get the shutdown exit signal which was initially directed to the poolboy process. And they are terminated. This termination is considered abnormal by the workers' supervisor (implementing poolboy_sup callback) since it did not send the signal itself. As a result, the supervisor reports the shutdown, which is logged by lager here.

The fact that poolboy traps exits does not prevent the propagation of the shutdown signal. The process is not terminated immediately when it receives the signal but it receives it as a message. gen_server intercepts this message, calls terminate/2 callback function and then terminates with shutdown, eventually propagating the signal to all linked processes.

If avoiding to link to workers is not an option, a way to fix this bug would be to unlink all workers in the terminate handler.

like image 152
Paul Guyot Avatar answered Oct 26 '22 00:10

Paul Guyot


How do you stop the application? Perhaps the supervisor should have a stop/1 function? e.g., see

http://www.erlang.org/doc/apps/kernel/application.html#stop-1

like image 1
Ivan Uemlianin Avatar answered Oct 26 '22 01:10

Ivan Uemlianin