Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between ensemble and quorum in zookeeper

I am new to zookeeper. I have configured it on a single machine. But I came across the words "ensemble" and "quorum" in the documentation of zookeeper.

Can anyone please tell me the difference between these?

  • Ensemble
  • Quorum
like image 418
user3797438 Avatar asked Aug 07 '14 05:08

user3797438


1 Answers

This answer is for those who still have doubt understanding Ensemble and Quorum. Ensemble is nothing but a cluster of Zookeeper servers, where in Quorum defines the rule to form a healthy Ensemble. Which is defined using a formula Q = 2N+1 where Q defines number of nodes required to form a healthy Ensemble which can allow N failure nodes. You will understand about this formula in the following example.

Before I start with an example, I want to define 2 things-
Cluster: Group of connected nodes/servers (now on will use node) with one node as Leader/Master and rest as Followers/Slaves.
Healthy Ensemble: A cluster with only one active Leader at any given point of time, hence fault tolerant.

Let me explain with an example, which is used commonly across while defining Ensemble and Quorum.

  1. Lets say you have 1 zookeeper node. No need to worry here as we need more than 1 node to form a cluster.
  2. Now take 2 nodes. There is no problem forming a cluster but there is problem to form a healthy Ensemble, because - Say the connection between these 2 nodes are lost, then both nodes will think the other node is down, so both of them try to act as Leader, which leads to inconsistency as they can't communicate with each other. Which means cluster of 2 nodes can't even afford even a single failure, so what is the use of this cluster??. They are not saying you can't make a cluster of 2 nodes, all they are saying is - it is same as having single node, as both don't allow even a single failure. Hope this is clear
  3. Now take 3 nodes. There is no problem forming a cluster or healthy Ensemble - as this can allow 1 failure according the formula above 3 = 2N+1 => N = (3-1)/2 = 1. So when the next failure occurs (either connection or node failure), no node will be elected as Leader, hence the Ensemble won't serve any write/update/delete services, hence the states of the client cluster remains consistent across zookeeper cluster nodes. So the Leader election won't happen until there is majority nodes available and connected, where Majority m = (n/2)+1, where n stands for number of nodes available when the previous election happened. So here, 1st election happened with 3 nodes (as its a 3 node cluster). Then there was a 1st failure, so remaining 2 nodes can conduct election, as they have majority m = (3/2)+1 = 2. Then 2nd failure happened, now they don't have majority as there is only one node available for election, but the majority required is m = (2/2)+1 = 2.
  4. Now take 4 nodes. There is no problem forming a cluster or healthy Ensemble, but having 4 nodes is same as 3 nodes, because both allows only 1 failure. Lets derive it from the Quorum formula 4 = 2N+1 => N = (4-1)/2 = ⌊1.5⌋ = 1 //floor(1.5)=1
  5. Now take 5 nodes. There is no problem forming a cluster or healthy Ensemble - as this can allow 2 failure according the formula above 5 = 2N+1 => N = (5-1)/2 = 2.
  6. Now take 6 nodes. There is no problem forming a cluster or healthy Ensemble, but having 6 nodes is same as 5 nodes, because both allows only 2 failure. Lets derive it from the Quorum formula 6 = 2N+1 => N = (6-1)/2 = ⌊2.5⌋ = 2


Conclusion:

  1. To form a Quorum we need atleast 3 nodes - as 2 node cluster can't even handle single failure
  2. Its good to form an Ensemble of odd number of nodes - as n (even number) nodes tends to allow same number of failure as of n-1 (odd number) nodes
  3. Its not good to have more nodes, as they add latency into performance. Suggested Production cluster size is 5 - if one server is down for maintenance, it still can handle one more failure.
like image 63
Ajax1986 Avatar answered Sep 19 '22 09:09

Ajax1986