Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch cluster configuration is not discovering any nodes under both unicast and multicast

I've been trying to use the lovely ansible-elasticsearch project to set up a nine-node Elasticsearch cluster.

Each node is up and running... but they are not communcating with each other. The master nodes think there are zero data nodes. The data nodes are not connecting to the master nodes.

They all have the same cluster.name. I have tried with multicast enabled (discovery.zen.ping.multicast.enabled: true) and disabled (previous setting to false, and discovery.zen.ping.unicast.hosts:["host1","host2",..."host9"]) but in either case the nodes are not communicating.

They have network connectivity to one another - verified via telnet over port 9300.

Sample output: $ curl host1:9200/_cluster/health {"error":{"root_cause":[{"type":"master_not_discovered_exception","reason":"waited for [30s]"}],"type":"master_not_discovered_exception","reason":"waited for [30s]"},"status":503}

I cannot think of any more reasons why they wouldn't connect - looking for any more ideas of what to try.

Edit: I finally resolved this issue. The settings that worked were publish_host to "_non_loopback:ipv4_" and unicast with discovery.zen.ping.unicast.hosts set to ["host1:9300","host2:9300","host3:9300"] - listing only the dedicated master nodes. I have a minimum master node count of 2.

like image 775
Dave Avatar asked Dec 19 '15 00:12

Dave


1 Answers

The only reasons I can think that can cause that behavior are:

  1. Connectivity issues - Ping is not a good tool to check that nodes can connect to each other. Use telnet and try connecting from host1 to host2 on port 9300.

  2. Your elasticsearch.yml is set to bind 127.0.0.1 or the wrong host (if you're not sure, bind 0.0.0.0 to see if that solves your connectivity issues and then it's important to change it to bind only internal hosts to avoid exposure of elasticsearch directly to the internet).

  3. Your publish_host is incorrect - This usually happens when you run ES inside a docker container for example, you need to make sure that the publish_host is set to an address that can be accessed via other hosts.

like image 135
Or Weinberger Avatar answered Sep 30 '22 23:09

Or Weinberger