Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mnesia can't connect to another node

I am setting up a rabbitmq cluster and ran into an issue during the one step in the process. Its straight out of the rabbitmq clustering guide.

root@celery:~# rabbitmqctl status
Status of node celery@celery ...
[{pid,20410},
 {running_applications,[{rabbit,"RabbitMQ","2.5.1"},
                        {os_mon,"CPO  CXC 138 46","2.2.4"},
                        {sasl,"SASL  CXC 138 11","2.1.8"},
                        {mnesia,"MNESIA  CXC 138 12","4.4.12"},
                        {stdlib,"ERTS  CXC 138 10","1.16.4"},
                        {kernel,"ERTS  CXC 138 10","2.13.4"}]},
 {os,{unix,linux}},
 {erlang_version,"Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:30] [hipe] [kernel-poll:true]\n"},
 {memory,[{total,25296704},
          {processes,9680280},
          {processes_used,9662720},
          {system,15616424},
          {atom,1099393},
          {atom_used,1082732},
          {binary,89768},
          {code,11606637},
          {ets,726848}]}]
...done.
root@celery:~# rabbitmqctl cluster_status
Cluster status of node celery@celery ...
[{nodes,[{disc,[celery@celery]}]},{running_nodes,[celery@celery]}]
...done.
root@celery:~# rabbitmqctl stop_app
Stopping node celery@celery ...
...done.
root@celery:~# rabbitmqctl reset
Resetting node celery@celery ...
...done.
root@celery:~# rabbitmqctl cluster worker1@worker1
Clustering node celery@celery with [worker1@worker1] ...
Error: {failed_to_cluster_with,[worker1@worker1],
                               "Mnesia could not connect to some nodes."}

What are the possible reasons one node wouldn't be able to connect to another?

Here's the guide I'm following: http://www.rabbitmq.com/clustering.html

like image 253
Shakakai Avatar asked Aug 04 '11 21:08

Shakakai


3 Answers

I installed the Docker RabbitMQ also encountered similar problems in the process.

The main reason is /var/lib/RabbitMQ/mnesia/rabbit/cluster_nodes.config configuration file on errors cannot be connected to.

Mnesia is a distributed, soft real-time database management system written in the Erlang programming language

There are several ways to repair this problem:

  1. Fix the configure file,using the correct cluster node name, from the log we see that our Node name is rabbit@cb43449d5d72
// log info 
...
rabbitmq    |   Starting broker...2019-11-27 16:18:22.621 [info] <0.304.0>
rabbitmq    |  node           : rabbit@cb43449d5d72
...

// This is the wrong configuration file:
$ cat ./mnesia/rabbit/cluster_nodes.config
{[rabbit@cb43449d5d72,rabbit@dc3288264c34],[rabbit@dc3288264c34]}.

// Update it with correctly config node name, and restart RabbitMQ server:
$ cat ./mnesia/rabbit/cluster_nodes.config
{[rabbit@cb43449d5d72],[rabbit@cb43449d5d72]}.
  1. The simplest way is to remove the mnesia directory and configure the correct node name, which like rabbit@my-rabbit, in /etc/hosts is 127.0.0.1 my-rabbit, after the operation, you should see the following configuration details
$ find . -name cluster_nodes.config
./mnesia/rabbit/cluster_nodes.config
./mnesia/rabbit@my-rabbit/cluster_nodes.config

$ cat ./mnesia/rabbit@my-rabbit/cluster_nodes.config
{['rabbit@my-rabbit'],['rabbit@my-rabbit']}.
like image 22
lupguo Avatar answered Nov 02 '22 06:11

lupguo


I jumped into the #rabbitmq channel on freenode. Here's the discussion that followed:

14:29 shakakai: hey all, i'm having a little issue with clustering rabbitmq http://stackoverflow.com/questions/6948624/mnesia-cant-connect-to-another-node
14:30 shakakai: has anyone run into that problem before?
14:30 daysmen has left IRC (Read error: Connection reset by peer)
14:30 antares_: shakakai: make sure that epmd is running on every node
14:30 antares_: shakakai: and that port it uses (4369) is open in your firewall
14:31 |Blaze|: shakakai: is your dns correct?  Can you ping worker1 from celery and celery from worker1
14:31 shakakai: |Blaze|: hmm...i'll check
14:31 daysmen has joined ([email protected])
14:32 shakakai: |Blaze|: this is where I'm a little confused, the rabbitmq nodename is worker1@worker1 but the fqdn to ping the box is "ping worker1.mydomain.com"
14:33 |Blaze|: can you "ping worker1"
14:34 shakakai: |Blaze|: no
14:34 |Blaze|: k, you'll need to fix that
14:34 hyperboreean has left IRC (Ping timeout: 250 seconds)
14:37 shakakai: |Blaze|: gotcha, so I setup a hosts file and i should be good
14:37 |Blaze|: yup
14:37 |Blaze|: in both directions

TL;DR

Make sure you can ping the rabbit nodename from each of the boxes you are clustering. If you can't, setup a hosts file for each rabbit nodename.

like image 111
Shakakai Avatar answered Nov 02 '22 08:11

Shakakai


There are several things to check before you can get the cluster to work well: 0) Ensure you are running the exact same rabbitmq version on each node 1) set up network until you are able to ping each server from each other 2) cookies - You have to get the exact same erlang cookie in the .erlang.cookie file on each server One trick is useful is to try this command from one node to see if you can reach another one from rabbitmq rabbitmqctl eval 'net_adm:ping(rabbit@othernode).'

this should say Pang if it's nok or pong if it's ok be careful to not forget the dot close to the end of the eval expression.

I got it working fine after several hours of unsuccessful trials.

3) Bear in mind that there may be an issue when restarting a node of a cluster if this node was not the last that was stop - it wont start before the last that stop was restarted. When all the above (0 to 2) are correct, 3 may well be the root cause of your problem...

Hope this help, cheers, jb

like image 2
Jean-Baptiste DUPONT Avatar answered Nov 02 '22 07:11

Jean-Baptiste DUPONT