Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are three nodes the recommended minimum number of nodes for Couchbase?

For Cassandra, there is a minimum requirement of three nodes to enable writes with strong consistency, assuming a replication factor of one (i.e. two copies of the dataset). This requirement does not seem to be the case for Couchbase, at least I haven't found it stated anywhere. Nevertheless, Couchbase still recomends a minimum of three nodes for a production system.

The only motivation I find is (1) a single node failure in a two-node system will give a single point of failure, and (2) a two node system will need to work harder when scaling up to a third node, than a three node system (I assuming this is because of rebalancing).

None of the motivations seem particularly compelling to me:

Reason (1) feels a bit like saying a two-disk RAID-1 is useless, only a three-disk RAID-6 (one data, two checksum) is acceptable. Nevertheless RAID-1 is quite popular (much more so than three-disk RAID-6:es), and usually considered relatively secure. Presumably the loss of a node will result in fast action from the administrator, so the risk should be short-lived.

Reason (2) seems even more transient to me. Two nodes need to work harder when adding a third than three nodes do when adding a fourth. Still, it is only one time this is a problem, and most applications have daily variations in load where the fitting in of the rebalancing could be done.

So I am wondering if there are any other reasons for avoiding two-node Couchbase clusters, assuming the two nodes are well able to carry the load?

like image 586
00prometheus Avatar asked Oct 30 '14 16:10

00prometheus


1 Answers

The main reason is that auto-failover is disabled with with less than three nodes. This is to prevent a "split-brain" situation. Consider two nodes, node A and node B. If one node isn't reachable (due to a network problem) then:

  • Node A can't reach node B, and there are no other nodes to confer with so he fails over (removing B from the cluster and promoting his own replicas)
  • Node B can't reach node A, and similary there is no other nodes so he fails over (removing A from the cluster).

Any clients which can still see both nodes now have essentially two independent clusters who both think they own the whole dataset.

Essentially in this situation you have violated Consistency in CAP.

For this reason Couchbase will not perform auto-failover with less than three nodes, and as this is a recommended feature to use in a production system, you need at least three nodes in your cluster.

See the Failover considerations chapter in the Couchbase Admin guide for more details.

like image 83
DaveR Avatar answered Nov 09 '22 22:11

DaveR