From the ZooKeeper FAQ:
Reliability: A single ZooKeeper server (standalone) is essentially a coordinator with no reliability (a single serving node failure brings down the ZK service). A 3 server ensemble (you need to jump to 3 and not 2 because ZK works based on simple majority voting) allows for a single server to fail and the service will still be available. So if you want reliability go with at least 3. We typically recommend having 5 servers in "online" production serving environments. This allows you to take 1 server out of service (say planned maintenance) and still be able to sustain an unexpected outage of one of the remaining servers w/o interruption of the service.
With a 3-server ensemble, if one server is taken out of rotation and one server has an unexpected outage, then there is still one remaining server that should ensure no interruption of service. Then why the need for 5 servers? Or is it more than just interruption of service that is being considered?
Update:
Thanks to @sbridges for pointing out that it has to do with maintaining a quorum. And the way that ZK defines a quorum is ceil(N/2)
where N
is the original number in the ensemble (and not just the currently available set).
Now, a google search for ZK quorum finds this in the HBase book chapter on ZK:
In ZooKeeper, an even number of peers is supported, but it is normally not used because an even sized ensemble requires, proportionally, more peers to form a quorum than an odd sized ensemble requires. For example, an ensemble with 4 peers requires 3 to form a quorum, while an ensemble with 5 also requires 3 to form a quorum. Thus, an ensemble of 5 allows 2 peers to fail and still maintain quorum, and thus is more fault tolerant than the ensemble of 4, which allows only 1 down peer.
And this paraphrasing of Wikipedia in Edward J. Yoon's blog:
Ordinarily, this is a majority of the people expected to be there, although many bodies may have a lower or higher quorum.
A three-node ZooKeeper ensemble will support one failure without loss of service, which is probably fine for most users and arguably the most common deployment topology. However, to be safe, use five nodes in your ensemble.
The odd number of servers allows ZooKeeper to perform majority elections for leadership. At any given time, there can be up to n failed servers in an ensemble and the ZooKeeper cluster will keep quorum . If at any time, quorum is lost, the ZooKeeper cluster will go down.
Generally, a typical Kafka cluster will be well served by three ZooKeeper nodes. If a Kafka deployment is particularly large, then consider utilizing five ZooKeeper nodes.
It's basically the minimum number of server nodes that must be up and running and available for client requests. Any update done to the ZooKeeper tree by the clients must be persistently stored in this quorum of nodes for a transaction to be completed successfully.
Zookeeper requires that you have a quorum of servers up, where quorum is ceil(N/2)
. For a 3 server ensemble, that means 2 servers must be up at any time, for a 5 server ensemble, 3 servers need to be up at any time.
Basically, Zookeeper will work just fine as long as Active Zookeepers are in MAJORITY compared to failed Zookeepers. Also, in case of even quorum size i.e 2,4,6 etc. Failed = Active, because of that its not recommended.
Both 3 and 4 will handle only 1 faliures then why whould we want to used 4 Zookeepers instead of 3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With