Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Whats the difference between Paxos and W+R>=N in Cassandra?

Dynamo-like databases (e.g. Cassandra) can enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+R>N where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage.

What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+R>N schema?

like image 1000
user1128016 Avatar asked Aug 28 '12 09:08

user1128016


People also ask

What is Paxos in Cassandra?

Paxos algorithm at heart is a consensus algorithm. Let's say you have multiple coordinator Cassandra nodes and all of these nodes are proposing multiple updates. The Paxos algorithm ensures that among all the proposed updates at any given time, a single value is chosen and executed in order.

Does Cassandra uses Paxos?

Cassandra uses Paxos to elect central leader by default that causes collision problem. Among existent consensus algorithms Raft separates the key elements of consensus, such as leader election, so enforces a stronger degree of coherency to reduce the number of states that must be considered, such as collision.

Does Paxos use quorum?

The original description of Paxos requires majority Quorum in both the prepare and the accept phases. Some recent work by Heidi Howard and others show that the main requirement of Paxos is to have overlap in the quorums of the prepare and the accept phase.

What consensus algorithm does Cassandra use?

Cassandra supports consensus via the paxos protocol since 2.0.


2 Answers

Yes, Paxos provides guarantees that are not provided by the Dynamo-like systems and their read-write quorums. The difference is how failures are handled and what happens during a write. After a successful write, both kind of systems behave similarly. The data will be saved and available for reading afterwards (until overwritten or deleted) and so on.

The difference appears during a write and after failures. Until you get a successful answer from W nodes when writing something to the eventually consistent systems, then the data may have been written to some nodes and not to others and there is no guarantee that the whole system agrees on the current value. If you try to read the data back at this point, some clients may get the new data back and some the old data back. In other words, the system is not immediately consistent. This is because writes aren't atomic across nodes in these systems. There are usually mechanisms to "heal" an inconsistency like this and "eventually" the system will become consistent again (i.e. reads will once again always return the same value, until something new is written). This is the reason why they are often called "eventually consistent". Inconsistencies can (and will) appear, but they will always be dealt with and reconciled eventually.

With Paxos, writes can be made atomic across nodes and inconsistencies between nodes are therefore possible to avoid. The Paxos algorithm makes it possible to guarantee that non-faulty nodes never disagree on the outcome of a write, at any point in time. Either the write succeeded everywhere or nowhere. There will never be any inconsistent reads at any point (if it's correctly implemented and if all the assumptions hold, of course). This comes at a cost, however. Mainly, the system may need to delay some requests and be unavailable when for example too many nodes (or the communication between them) aren't working. This is necessary to assure that no inconsistent replies are given.

To summarize: the main difference is that the Dynamo-like systems can return inconsistent results during writes or after failures for some time (but will eventually recover from it), whereas Paxos based systems can guarantee that there are never any such inconsistencies by sometimes being unavailable and delaying requests instead.

like image 59
Hampus Avatar answered Sep 24 '22 07:09

Hampus


Paxos is non-trivial to implement, and expensive enough that many systems using it use hints as well, or use it only for leader election, or something. However, it does provide guaranteed consistency in the presence of failures - subject of course to the limits of its particular failure model.

The first quorum based systems I saw assumed some sort of leader or transaction infrastructure that would ensure enough consistency that you could trust that the quorum mechanism worked. This infrastructure might well be Paxos-based.

Looking at descriptions such as https://cloudant.com/blog/dynamo-and-couchdb-clusters/, it would appear that Dynamo is not based on an infrastructure that guarantees consistency for its quorum system - so is it being very clever or cutting corners? According to http://muratbuffalo.blogspot.co.uk/2010/11/dynamo-amazons-highly-available-key.html, "The Dynamo system emphasizes availability to the extent of sacrificing consistency. The abstract reads "Dynamo sacrifices consistency under certain failure scenarios". Actually, later it becomes clear that Dynamo sacrifices consistency even in the absence of failures: Dynamo may become inconsistent in the presence of multiple concurrent write requests since the replicas may diverge due to multiple coordinators." (end quote)

So, it would appear that in the case of quorums as implemented in Dynamo, Paxos provides stronger reliability guarantees.

like image 32
mcdowella Avatar answered Sep 25 '22 07:09

mcdowella