Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recovering from Consul "No Cluster leader" state

Tags:

mesos

consul

I have:

  • one mesos-master in which I configured a consul server;
  • one mesos-slave in which I configure consul client, and;
  • one bootstrap server for consul.

When I hit start I am seeing the following error:

2016/04/21 19:31:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader 2016/04/21 19:31:44 [ERR] agent: coordinate update error: rpc error: No cluster leader

How do I recover from this state?

like image 502
deen Avatar asked Apr 21 '16 14:04

deen


People also ask

How do I stop being a Consul agent?

Stop the agent Stop the Consul agent by using the consul leave command. This will gracefully stop the agent, causing it to leave the Consul datacenter and shut down.

What if Consul server goes down?

If the server cannot be recovered, you need to bring up a new server using the deployment guide. In the case of an unrecoverable server failure in a single server cluster and there is no backup procedure, data loss is inevitable since data was not replicated to any other servers.

How do I remove a Consul?

Issue the consul-k8s uninstall command to remove Consul on Kubernetes. You can specify the installation name, namespace, and data retention behavior using the applicable options. By default, the uninstall preserves the secrets and PVCs that are provisioned by Consul on Kubernetes.

How do I know if my Consul is running?

The easiest way to view initial health status is by visiting the Consul Web UI at http://localhost:8500/ui . Click through to a specific service such as the counting service. The status of the service on each node will be displayed.


1 Answers

As of Consul 0.7 things work differently from Keyan P's answer. raft/peers.json (in the Consul data dir) has become a manual recovery mechanism. It doesn't exist unless you create it, and then when Consul starts it loads the file and deletes it from the filesystem so it won't be read on future starts. There are instructions in raft/peers.info. Note that if you delete raft/peers.info it won't read raft/peers.json but it will delete it anyway, and it will recreate raft/peers.info. The log will indicate when it's reading and deleting the file separately.

Assuming you've already tried the bootstrap or bootstrap_expect settings, that file might help. The Outage Recovery guide in Keyan P's answer is a helpful link. You create raft/peers.json in the data dir and start Consul, and the log should indicate that it's reading/deleting the file and then it should say something like "cluster leadership acquired". The file contents are:

[ { "id": "<node-id>", "address": "<node-ip>:8300", "non_voter": false } ]

where <node-id> can be found in the node-id file in the data dir.

like image 178
Mike Placentra Avatar answered Sep 21 '22 10:09

Mike Placentra