I'm setting up an elasticsearch (5.0.1) cluster.
It has three master-eligible nodes :
el-m01
el-m02
el-m03
The cluster fails to assemble, and Every master node gets the following NotMasterException
exception in the logs :
[2016-11-21T15:24:13,274][INFO ][o.e.d.z.ZenDiscovery ] [el-m01] failed to send join request to master [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}], reason [RemoteTransportException[[el-m02][192.168.110.118:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{el-m02}{bBhsu3fJSj-MyiWJGhQmog}{_IzdeUd4Sv6g-rhemGjEVQ}{192.168.110.118}{192.168.110.118:9300}{rack=r1}] not master for join request]; ], tried [3] times
Enabling the debugging logs allowed me to understand the following :
The master election is happening, and is a success. However, while every node has chosen a master, no nodes thinks he is the master. i.e. :
What is happening here?
Here is the situation : By cloning a VM to get all the masters, every node has the same node id.
This can be verified with the following command, listing all nodes ids :
GET /_cat/nodes?v&h=id,ip,name&full_id=true
Note that since your cluster hasn't formed, each node needs to be queried individually, i.e :
curl 192.168.110.111:9200/_cat/nodes?v&h=id,ip,name&full_id=true
curl 192.168.110.112:9200/_cat/nodes?v&h=id,ip,name&full_id=true
(...)
This is bad. the node ids need to be unique.
To solve this situation, you need to delete the indices (in /var/lib/elasticsearch
) on every node. This will delete all data in elasticsearch, and will also reset the node ids.
To avoid having this problem in the first place, you can :
The Elasticsearch data directory $ES_HOME/data
, or in the case of RPM, e.g., /var/lib/elasticsearch
contains a randomly generated node ID when Elasticsearch is first started. If this directory is copied to multiple instances that are expected to form a cluster, the following error should be received:
failed to send join request to master [..] IllegalArgumentException [..] found existing node [..] with the same id but is a different node instance
However, when minimum_master_nodes
is not met, an error less indicative of the problem is received:
failed to send join request to master [..] NotMasterException [..] not master for join request
Github: https://github.com/elastic/elasticsearch/issues/32904
The issue can be resolved by deleting the contents of the data directory, and data directories shouldn't be copied in the first place.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With