Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

EACH_QUORUM VS QUORUM

Tags:

cassandra

This is a screenshot from the consistency level table according to Datastax documentation:

enter image description here

What is the difference between EACH_QUORUM and QUORUM? Each and all DC's are the same AFAIK. In the QUORUM row the following is stated:

Some level of failure is possible

Why? If one node is down in each DC? The same applies for EACH_QUORUM right? Why does EACH_QUORUM does not have some level of failure, since it is ALL_QUORUM and not ALL?

Both levels have the same in common (AFAIK):

  • All/each (same right?) DC's needs to be online
  • 51% or more of the nodes need to confirm the read/write.
like image 918
J. Doe Avatar asked Oct 22 '25 13:10

J. Doe


2 Answers

The difference between QUORUM and EACH_QUORUM is as follows.

Assume you have 6 nodes in your cluster - 2 DCs with 3 nodes each and RF=3 for both DCs (all nodes have all data).

The QUORUM and EACH_QUORUM value is the same = 4 (6/2 + 1). However, which nodes can respond varies slightly. EACH_QUORUM has less combinations of what will satisfy the condition.

QUORUM requires 4 nodes to respond but with any combination of nodes. So for example, maybe 3 nodes from the local DC and 1 node from the remote DC respond. That's perfectly fine.

Now, with EACH_QUORUM, each DC must have a quorum respond. What the means is that 2 nodes from each DC must respond in this case, that's it (which 2 nodes in each DC is irrelevant) . 3 nodes from the local DC and 1 node from a remote DC does not qualify as 1 node in the remote dc is not a quorum of that dc.

Let's change the cluster node count to 7 instead of 6. DC1 has 4 nodes, DC2 has 3 nodes. DC1 RF = 4 and DC2 RF = 3 (all nodes have the data again). Here's where the fun begins with the odd number total in the RF.

While I'm not sure about the word "failure", but I can see certain scenarios where this could be problematic.

For QUORUM, 4 nodes need to respond (7/2 + 1 = 4) - any 4 nodes - including the scenario when all nodes from the local/larger DC responds (DC1 in this case). What if the most current data is on DC2? In this scenario, you could end up with undesirable results.

With EACH_QUORUM, 5 nodes would need to respond (Quorum of DC1 = 4/2+1 = 3, Quorum of DC2 = 3/2+1 = 2 ==> total = 5). With this scenario, you're forcing Cassandra to return data from both DCs - and a QUORUM level from each DC which should give you good results.

Again, I'm trying in my head to determine where the additional "failures" could come with QUORUM v.s. EACH_QUORUM and I can't at the top of my head see it. It would seem if anything, EACH_QUORUM with an odd node count, is less flexible in unavailable nodes as a quorum in each DC must respond v.s. any quorum number of nodes from any DC. I can see where QUORUM may give you undesirable results though (explained above).

like image 71
Jim Wartnick Avatar answered Oct 25 '25 01:10

Jim Wartnick


One thing to consider is that QUORUM is related to the Replication Factor(RF) and this will determine the number of nodes that can be offline for each datacenter and allowing to complete the transaction. This means that If one node is down in each DC, this doesn't necessarily will cause inconsistency or failed queries.

For this, use the formula:

NodesNeededForQuorum = ReplicationFactor / 2 + 1

remember to round down the result.

It may be easier to demonstrate the difference with the following scenario: Let's assume that you have 2 DC's with RF of 3 in each datacenter; if you use QUORUM it will require at least 4 nodes from any DC are able to process the query, it could be 2 from each DC, 3 from DC1 and 1 from DC2, or 1 from DC1 and 3 from DC2. With EACH_QUORUM it will also need that 4 nodes are able to answer, but they should be only 2 from each DC.

If you have 3 DC's with RF of 3, QUORUM will be fulfilled with 5 nodes from any DC, while EACH_QUORUM will require 6 nodes (2 from each DC).

Things can be more complex if the RF is different between DC's, and that will depend on the cluster design.

When using EACH_QUORUM please consider the latency while communicating within different DC's, if the network communication is slow, or they are located in distant geographical locations there could be timeouts of the queries.

like image 41
Carlos Monroy Nieblas Avatar answered Oct 24 '25 23:10

Carlos Monroy Nieblas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!