I received such message in erlang condose at first@localhost node
=ERROR REPORT==== 1-Jan-2011::23:19:28 ===
** Node 'second@localhost' not responding **
** Removing (timedout) connection **
My question is - what is timeout in this case? How much time before causes this event? Howto prevent this "horror"? I can restore\recover to normal work only by restart node... But what is the right way?
Thank you, and Happy New Year!
Grepping for the not responding string in the Erlang source code, you can see how the message is generated in the dist_util
module in the kernel
application (con_loop
function).
{error, not_responding} ->
error_msg("** Node ~p not responding **~n"
"** Removing (timedout) connection **~n",
[Node]),
Within the module, the following documentation is present, explaining the logic behind ticks and not responding nodes:
%%
%% Send a TICK to the other side.
%%
%% This will happen every 15 seconds (by default)
%% The idea here is that every 15 secs, we write a little
%% something on the connection if we haven't written anything for
%% the last 15 secs.
%% This will ensure that nodes that are not responding due to
%% hardware errors (Or being suspended by means of ^Z) will
%% be considered to be down. If we do not want to have this
%% we must start the net_kernel (in erlang) without its
%% ticker process, In that case this code will never run
%% And then every 60 seconds we also check the connection and
%% close it if we havn't received anything on it for the
%% last 60 secs. If ticked == tick we havn't received anything
%% on the connection the last 60 secs.
%% The detection time interval is thus, by default, 45s < DT < 75s
%% A HIDDEN node is always (if not a pending write) ticked if
%% we haven't read anything as a hidden node only ticks when it receives
%% a TICK !!
Hope this helps a bit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With