To perform the leader election, Kubernetes documentation suggests deploying a sidecar in the set of candidate pods.
https://kubernetes.io/blog/2016/01/simple-leader-election-with-kubernetes/
This sidecar follows the following steps to elect a leader.
There are a few issues with this method.
If the current leader hangs and cannot update the endpoint in time, one of the other sidecars will acquire the leadership. But it will take some time for the previous leader to realize that their leadership status is revoked. For this brief period of time, existing 2 leaders can corrupt a shared resource/data.
This is also mentioned in its source code as well.
This implementation does not guarantee that only one client is acting as a leader (a.k.a. fencing).
So, what is the proper method to elect a leader with Kubernetes?
Found this useful with regards to the leader election architecture:
The election results and leader identity is maintained at the endpoint/configmap kubernetes object in the form of annotation. The endpoint/configmap kubernetes object is queried and updated by the candidates and leader, during each heartbeat of the leader election criteria. The sidecar container with each container does this logic and the main container can query the sidecar to check if the leader has shifted to itself. This passive way of knowing the information is causing the delay in this entire approach.
How about the information of leader & election process is held among all the candidates themselves and not in some common lock. The candidates instead of querying the lock, can check among themselves who have joined the flock and are healthy. This brings the consensus and election logic internal to the candidates making them independent of any external locking mechanism. The consumption of the KubeAPI is also a factor which needs to be considered. Multiple deployments with large replica sets will consume a lot of resources of KubeAPI server to query and update the endpoint/configmap kubernetes object.
Create a library that provides an object, constructed with the knowledge of the network addresses of peers and the network address about itself. Each democracy object in the network has a random weight added to it.
The Democracy object starts sending ping requests and checks on acknowledgement from them over UDP unicast. Once a new object is created, it sends signal to all the peers about it’s presence in the network and an added event is thrown. Hence, every new democracy object starts with the added event where it discovers about the peers present in the network. The new democracy node adds every new node to their local cache and checks for their health on confirmation of the acknowledgment. Once a peer doesn’t respond over the ping request, it is removed from the local peer cache and the removed event is thrown. The object provides listeners on events such as:
Every candidate in the democracy network, checks for the presence of a leader. If a leader is not present, the peer with the highest weight is selected as a leader. This would throw an elected event on the chosen leader node. If there are two leaders in the peer network, the leader with lesser weightage resigns, throwing the event resigned.
These are pro-active events, the application can choose to take a decision based on these events. To implement this on our Kubernetes we need to form the object with the source address and peers, the trick is where do we get it from. Again to our help, the Endpoints object maintains the name of the Pod and the network addresses of the pod between which the election has to happen. Endpoints object is created when we add pod label as a selector and create a kubernetes service object.
Each pod when comes up with label selector added in kubernetes service object, can discover few information regarding itself, such as the pod name assigned to it. This can be done by passing the metadata name in the deployment YAML.
Once the pod is ready, it gets a network address assigned to it when associated with a service. The endpoints object contains an array, with each element in the object containing the information of the pod name and network address associated in it. The array of information can be parsed to form the democracy object.
SOURCE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With