I'm paying with distributed erlang applications.
Configuration and ideas are taken from:
http:/www.erlang.org/doc/pdf/otp-system-documentation.pdf 9.9. Distributed Applications
Configuration files:
[{kernel, [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]}, {sync_nodes_mandatory,['n2@a2-X201','n3@a2-X201']}, {sync_nodes_timeout,5000} ]} ,{sasl, [ %% All reports go to this file {sasl_error_logger,{file,"/tmp/wd_n1.log"}} ] }].
[{kernel, [{distributed,[{wd,5000,['n1@a2-X201',{'n2@a2-X201','n3@a2-X201'}]}]}, {sync_nodes_mandatory,['n1@a2-X201','n3@a2-X201']}, {sync_nodes_timeout,5000} ] } ,{sasl, [ %% All reports go to this file {sasl_error_logger,{file,"/tmp/wd_n2.log"}} ] }].
Now start erlang in 3 separate terminals:
Start application on each of erlang nodes: * application:start(wd).
(n1@a2-X201)1> application:start(wd). =INFO REPORT==== 19-Jun-2011::15:42:51 === wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" ok
(n2@a2-X201)1> application:start(wd). ok (n2@a2-X201)2>
(n3@a2-X201)1> application:start(wd). ok (n3@a2-X201)2>
At the moment everything is Ok. As written in Erlang documentation: Application is running at node n1@a2-X201
Now kill node n1: Application was migrated to n2
(n2@a2-X201)2> =INFO REPORT==== 19-Jun-2011::15:46:28 === wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"
Continue our game: kill node n2 One more time system works fine. We have our application at node n3
(n3@a2-X201)2> =INFO REPORT==== 19-Jun-2011::15:48:18 === wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$"
Now restore nodes n1 and n2. So:
Erlang R14B (erts-5.8.1) [source] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.8.1 (abort with ^G) (n1@a2-X201)1> Eshell V5.8.1 (abort with ^G) (n2@a2-X201)1>
Nodes n1 and n2 are back.
Looks like now I have to restart application manually:
* Let's do it at node n2 first:
(n2@a2-X201)1> application:start(wd).
(n1@a2-X201)1> application:start(wd). =INFO REPORT==== 19-Jun-2011::15:55:43 === wd_plug_server starting... PluginId: 4 Path: "/home/a2/src/erl/data/SIG" FileMask: "(?i)(.*)\\.SIG$" ok (n1@a2-X201)2>
It works. And node n2 also has returned OK:
Eshell V5.8.1 (abort with ^G) (n2@a2-X201)1> application:start(wd). ok (n2@a2-X201)2>
At node n3 we see:
=INFO REPORT==== 19-Jun-2011::15:55:43 === application: wd exited: stopped type: temporary
In general, everything looks ok, as written in documentation, except for delay with starting application at node n2.
Now kill node n1 once more:
(n1@a2-X201)2> User switch command --> q [a2@a2-X201 releases]$
Ops ... everything hangs. Application was not restarted at another node.
Actually, while I was writing this post I've realized that sometime everything id Ok, sometime I have a problem.
Any ideas, While there could be problems when restoring "primary" node nd killing it one more time?
As explained over at Learn You Some Erlang (scroll to the bottom), distributed applications only work well when started as part of a release, not when you start them manually with application:start
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With