I have:
When I hit start I am seeing the following error:
2016/04/21 19:31:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader 2016/04/21 19:31:44 [ERR] agent: coordinate update error: rpc error: No cluster leader
How do I recover from this state?
Stop the agent Stop the Consul agent by using the consul leave command. This will gracefully stop the agent, causing it to leave the Consul datacenter and shut down.
If the server cannot be recovered, you need to bring up a new server using the deployment guide. In the case of an unrecoverable server failure in a single server cluster and there is no backup procedure, data loss is inevitable since data was not replicated to any other servers.
Issue the consul-k8s uninstall command to remove Consul on Kubernetes. You can specify the installation name, namespace, and data retention behavior using the applicable options. By default, the uninstall preserves the secrets and PVCs that are provisioned by Consul on Kubernetes.
The easiest way to view initial health status is by visiting the Consul Web UI at http://localhost:8500/ui . Click through to a specific service such as the counting service. The status of the service on each node will be displayed.
As of Consul 0.7 things work differently from Keyan P's answer. raft/peers.json
(in the Consul data dir) has become a manual recovery mechanism. It doesn't exist unless you create it, and then when Consul starts it loads the file and deletes it from the filesystem so it won't be read on future starts. There are instructions in raft/peers.info
. Note that if you delete raft/peers.info
it won't read raft/peers.json
but it will delete it anyway, and it will recreate raft/peers.info
. The log will indicate when it's reading and deleting the file separately.
Assuming you've already tried the bootstrap
or bootstrap_expect
settings, that file might help. The Outage Recovery guide in Keyan P's answer is a helpful link. You create raft/peers.json
in the data dir and start Consul, and the log should indicate that it's reading/deleting the file and then it should say something like "cluster leadership acquired". The file contents are:
[ { "id": "<node-id>", "address": "<node-ip>:8300", "non_voter": false } ]
where <node-id>
can be found in the node-id
file in the data dir.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With