Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why ZooKeeper needs majority to run?

I've been wondering why ZooKeeper needs a majority of the machines in the ensemble to work at all. Lets say we have a very simple ensemble of 3 machines - A,B,C.

When A fails, new leader is elected - fine, everything works. When another one dies, lets say B, service is unavailable. Does it make sense? Why machine C cannot handle everything alone, until A and B are up again?

Since one machine is enough to do all the work (for example single machine ensemble works fine)...

Is there any particular reason why ZooKeeper is designed in this way? Is there a way to configure ZooKeeper that, for example ensemble is available always when at least one of N is up?

Edit: Maybe there is a way to apply a custom algorithm of leader selection? Or define a size of quorum?

Thanks in advance.

like image 771
Michał Szkudlarek Avatar asked Apr 17 '13 13:04

Michał Szkudlarek


2 Answers

The reason to get a majority vote is to avoid a problem called "split-brain".

Basically in a network failure you don't want the two parts of the system to continue as usual. you want one to continue and the other to understand that it is not part of the cluster.

There are two main ways to achieve that one is to hold a shared resource, for instance a shared disk where the leader holds a lock, if you can see the lock you are part of the cluster if you don't you're out. If you are holding the lock you're the leader and if you don't your not. The problem with this approach is that you need that shared resource.

The other way to prevent a split-brain is majority count, if you get enough votes you are the leader. This still works with two nodes (for a quorum of 3) where the leader says it is the leader and the other node acting as a "witness" also agrees. This method is preferable as it can work in a shared nothing architecture and indeed that is what Zookeeper uses

As Michael mentioned, a node cannot know if the reason it doesn't see the other nodes in the cluster is because these nodes are down or there's a network problem - the safe bet is to say there's no quorum.

like image 157
Arnon Rotem-Gal-Oz Avatar answered Nov 02 '22 00:11

Arnon Rotem-Gal-Oz


Zookeeper is intended to distribute things reliably. If the network of systems becomes segmented, then you don't want the two halves operating independently and potentially getting out of sync, because when the failure is resolved, it won't know what to do. If you have it refuse to operate when it's got less than a majority, then you can be assured that when a failure is resolved, everything will come right back up without further intervention.

like image 29
Michael Kohne Avatar answered Nov 02 '22 00:11

Michael Kohne