Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoiding split-brain, votes and quorum [closed]

Tags:

messaging

Suppose you have n processes, n > 2. You want to have agreement amongst them that one is to be active. So they need to vote amonst each other to determine which one is active.

All processes may fail at any time, we want to have one process active if possible, but ...

We must never have two active at the same time, so if they can't be sure it is better to have no-one active. (Ie. we want to avoid split brain)

The only available communication mechanism between them is pub-sub messaging (not point to point).

One or more databases are available, but no one database should be a single point of failure. Ie. it would be very undesirabloe if all the processes were available to work, and the were prevented from doing so by the loss of a single database.

Design? What messages need to be published?

like image 676
djna Avatar asked Jul 09 '09 23:07

djna


1 Answers

Theory:

This is leader election, which is a form of the Consensus Problem, also sometimes called The Two Generals Problem. Under some sets of assumptions (fully async and messages can be lost) it's been proven impossible, and the proof is particularly elegant.

The intuition of this problem is: imagine some algorithm exists that allows consensus to be reached in some fixed number of messages. Since failures are tolerated, we can drop one message from the protocol, and it should still work. We can repeat this process until there are no messages at all, clearly an impossibility.

In practice we overcome this using failure detectors to simulate a synchronous system.

The most widely known algorithm that solves consensus is Paxos, which can tolerate failure of up to half of the participating nodes. Paxos has the reputation of being very difficult to implement as even slight misunderstandings of the details of the protocol destroy it's correctness.

Practical solutions:

While the problem in general is quite difficult, getting working systems up is far easier. There are off the shelf implementations of Paxos or equivalent algorithms available. Apache Zookeeper is the best I'm aware of. For your specific problem, I'm pretty sure it'll be your quickest route. Other Paxos implementations are around, and it also might be possible to build something on network redundancy virtual ip tools like Wackamole. I believe the high end versions of most commercial databases offer quorum features as an (expensive) option.

Also, for many applications it's acceptable to weaken correctness slightly or otherwise adjust the problem to allow much simpler solutions.

For example, if a single point of failure is tolerable because recovery is likely to be quick, then the problem is trivial: just have one special node do the work.

Another approach might be to build the system around idempotent actions, so duplicate processing becomes tolerable.

Lastly you might partition the workload into a pool of non-redundant systems: here failures will delay processing until recovery but only for items at that node, not for the entire workload.

These sorts of compromises are so much simpler that they're often a better choice. One has to weigh the utility of a full solution against the complexity of implementing it and see if there's really value. This is why so many practical systems just use 2 Phase or 3 Phase Commit, even though they block in some scenarios: the decreased availability is tolerable compared to the complexity of a full quorum system.

like image 135
4 revs, 3 users 85% Avatar answered Nov 09 '22 10:11

4 revs, 3 users 85%