I am trying to get my head around how to do application leader election using Consul. I am using the LeaderElectionUtil from the java consul-client.
I can elect a leader, and all nodes agree on the leader but if the leader application dies, the other nodes seem unaware and continue to keep getting the dead leader when calling getLeaderInfoForService - ie no new leadership election takes place.
The Leader Electrion Guide (https://www.consul.io/docs/guides/leader-election.html) mentions:
"Note that the session by default makes use of only the gossip failure detector. That is, the session is considered held by a node as long as the default Serf health check has not declared the node unhealthy. Additional checks can be specified if desired."
So from this I am assuming that maybe I need to add a application level health check (TTL etc) to the session, so that the session will be invalidated when the application fails? Is this the correct idea and if so is there any way to do this via the java client? I am OK with ditching the LeaderElectionUtil and writing my code to elect a leader but it seems like even in the SessionClient there is no way to create a session with a health check associated with it?
Or maybe there is a better way to achieve this (application level failure detection for leader re-election)? I am kind of stuck so any pointers would be appreciated.
So I solved it in case anyone else comes across this problem.
I couldn't use the LeaderElectionUtil but I created my own class to do the same sort of thing, but in the createSession method I added a TTL of 10s.
private String createSession(String serviceName) {
    final Session session = 
ImmutableSession.builder().name(serviceName).ttl("10s").build();
return client.sessionClient().createSession(session).getId();
}
In order for this to work you will need to have a background thread that calls renewSession on the session at least once every 10 seconds.
I'm trying to implement the same requirement: I have a Java service that needs to elect a leader, and I don't have service health checks configured in Consul.
Using LeaderElectionUtil from Consul-client is problematic because if all the reasons noted above. Unfortunately it is also not possible to customize LeaderElectionUtil because all of its internal workings are done using private methods (it should have used protected and let users override the session creation - for example).
I've tried implementing "Service Registration" as documented in the "Basic Usage - Example 1" in the consul-client README, but calling AgentClient.pass() always throws an exception for me.
So my solution is exactly what you've specified - have a session with a TTL and renew it as long as the service is alive.
Here's my implementation, which requires the user to also register a callback that is used to check if the service is still valid for renewal, just in case:
public class SessionHolder implements Runnable {
    private static final String TTL_TEMPLATE = "%ss";
    private Consul client;
    private String id;
    private LinkedList<Supplier<Boolean>> liveChecks = new LinkedList<>();
    private long ttl;
    private boolean shutdown = false;
    public SessionHolder(Consul client, String service, long ttl) {
        this.client = client;
        this.ttl = ttl;
        final Session session = ImmutableSession.builder()
                .name(service)
                .ttl(String.format(TTL_TEMPLATE, ttl))
                .build();
        id = client.sessionClient().createSession(session).getId();
        Thread upkeep = new Thread(this);
        upkeep.setDaemon(true);
        upkeep.start();
    }
    public String getId() {
        return id;
    }
    public void registerKeepAlive(Supplier<Boolean> liveCheck) {
        liveChecks.add(liveCheck);
    }
    @Override
    public synchronized void run() {
        // don't start renewing immediately
        try {
            wait(ttl / 2 * 1000);
        } catch (InterruptedException e) {}
        while (!isShutdown()) {
            if (liveChecks.isEmpty() || liveChecks.stream().allMatch(Supplier::get)) {
                client.sessionClient().renewSession(getId());
            }
            try {
                wait(ttl / 2 * 1000);
            } catch (InterruptedException e) {
                // go on, try again
            }
        }
    }
    public synchronized boolean isShutdown() {
        return shutdown;
    }
    public synchronized void close() {
        shutdown = true;
        notify();
        client.sessionClient().destroySession(getId());
    }
}
Then electing a leader is more or less as simple as:
if (consul.keyValueClient().acquireLock(getServiceKey(service), currentNode, sessionHolder.getId()))
    return true; // I'm the leader
One thing that needs remembering, is that if the session terminates without cleaning up properly (what I do above in SessionHolder.close()), the lock-delay feature of consul will prevent a new leader to be elected for about 15 seconds (the default, which unfortunately Consul-client does not offer an API to modify). 
To solve this, in addition to making sure that properly terminating services clean up after themselves as demonstrated above, I also make sure to have the service hold the leader position for the minimal amount of time needed, and to release the leadership when no longer using it, by calling consul.keyValueClient().releaseLock(). For example, I have a clustered service where we elect a leader to read data updates from an external RDBMS (which are then distributed in the cluster directly instead of each node reloading all the data). As this is done via polling, each node will try to get elected before polling, and if elected it will poll the database, disseminate update and resign. If it crashes after that, delay-lock will not prevent another node from polling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With