Could you please explain why is Cassandra not linearizable even when quorum based read and writes are used? Linearizability defined as <blockquote> If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state. </blockquote>

Edit considering Cassandra foreground Read Repair: Writes that fail because only a partial set of replicas are updated could lead to two different readers seeing two different values of data. This is because of the lack of rollbacks in simple quorum-based consistency approaches. This behavior breaks the linearizability guarantees for single-key reads. As described in this discussion, a distributed consensus protocol such as Raft or Paxos is a must-have for such a guarantee. <img src="https://i.stack.imgur.com/Rp0zx.png" alt="enter image description here"> Also, other phenomena such as clock drift and leap second can break the Cassandra session consistency. Earlier Answer (without considering Cassandra foreground read repair): Summary: In Cassandra write may not feel atomic. Some nodes get writes faster than others thus even if we rely on quorum the result depends on the set of nodes that return values and what values they hold at that point. Also, to explain definition of linearizability adding to definition in bold <blockquote> If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state again) . </blockquote> Copying from Martin Klepmann's Data Intensive Applications book Linearizability and quorums Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model. However, when we have variable network delays, it is possible to have race conditions, as demonstrated in Figure 9-6. <img src="https://i.stack.imgur.com/aaprn.png" alt="enter image description here"> In Figure 9-6, the initial value of x is 0, and a writer client is updating x to 1 by sending the write to all three replicas (n = 3, w = 3). Concurrently, client A reads from a quorum of two nodes (r = 2) and sees the new value 1 on one of the nodes. Also concurrently with the write, client B reads from a different quorum of two nodes, and gets back the old value 0 from both. The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable: B’s request begins after A’s request completes, but B returns the old value while A returns the new value. (It’s once again the Alice and Bob situation from Figure 9-1.) Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178) synchronously, before returning results to the application [23], and a writer must read the latest state of a quorum of nodes before sending its writes [24, 25]. However, Riak does not perform synchronous read repair due to the performance penalty [26]. Cassandra does wait for read repair to complete on quorum reads [27], but it loses linearizability if there are multiple concurrent writes to the same key, due to its use of last-write-wins conflict resolution. Moreover, only linearizable read and write operations can be implemented in this way; a linearizable compare-and-set operation cannot, because it requires a consensus algorithm [28]. In summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability. And a bit more explaination about Linearizability vs Serializability: <img src="https://i.stack.imgur.com/vNIA2.png" alt="enter image description here">

EDIT: In hindsight this isn't the best explanation. I recommend reading Anurag's answer below which is much more concise. Since normal Cassandra operations don't observe existing state that it is changing, quorum consistency alone is not considered 'Linearizable'. For example, if you were to to adjust the balance of a bank account, you would need to know the current balance in order to adjust it. Consider a client that executes the following operations: <pre class="prettyprint"><code>A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12) B. UPDATE account SET balance=4.12 WHERE id='x' (subtract 1$ from balance) </code></pre> The problem is, from the perspective of Cassandra, the operation <code>B</code> doesn't effectively 'see' <code>A</code> since it's not considering the existing state of the data or any other operations that may be occurring for that matter. Another client could be updating balance for the same account during the submission of <code>B</code>. Lightweight Transactions in Cassandra 2.0 describes how lightweight transactions provide 'linearizable consistency' by providing constructs that ensure that operations are performed in sequence for a given partition and are not interrupted by others. So instead of my previous example, you can now do: <pre class="prettyprint"><code>A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12) B. UPDATE account SET balance=4.12 WHERE id='x' IF balance=5.12 </code></pre> The use of <code>IF balance=5.12</code> instructs Cassandra to begin a lightweight transaction, which uses a paxos consesus protocol for leadership election and to ensure operations are applied sequentially. If the state of balance does not meet the condition, the update will not be applied (indicated in a successful response with a <code>was_applied</code> boolean column). If C* is not able to achieve this within some timeout (due to contention or some other factors), the operation will fail, will not be applied, and the client will be surfaced a timeout.

Why is Cassandra not linearizable when quorum based read and writes are used

2 Answers

Edit considering Cassandra foreground Read Repair:

Writes that fail because only a partial set of replicas are updated could lead to two different readers seeing two different values of data. This is because of the lack of rollbacks in simple quorum-based consistency approaches. This behavior breaks the linearizability guarantees for single-key reads. As described in this discussion, a distributed consensus protocol such as Raft or Paxos is a must-have for such a guarantee.

enter image description here

Also, other phenomena such as clock drift and leap second can break the Cassandra session consistency.

Earlier Answer (without considering Cassandra foreground read repair):

Summary: In Cassandra write may not feel atomic. Some nodes get writes faster than others thus even if we rely on quorum the result depends on the set of nodes that return values and what values they hold at that point.

Also, to explain definition of linearizability adding to definition in bold

If operation B started after operation A successfully completed, then operation B must see the system in the same state as it was on completion of operation A, or a newer state (but never old state again) .

Copying from Martin Klepmann's Data Intensive Applications book

Linearizability and quorums Intuitively, it seems as though strict quorum reads and writes should be linearizable in a Dynamo-style model. However, when we have variable network delays, it is possible to have race conditions, as demonstrated in Figure 9-6.

enter image description here

In Figure 9-6, the initial value of x is 0, and a writer client is updating x to 1 by sending the write to all three replicas (n = 3, w = 3). Concurrently, client A reads from a quorum of two nodes (r = 2) and sees the new value 1 on one of the nodes. Also concurrently with the write, client B reads from a different quorum of two nodes, and gets back the old value 0 from both.

The quorum condition is met (w + r > n), but this execution is nevertheless not linearizable: B’s request begins after A’s request completes, but B returns the old value while A returns the new value. (It’s once again the Alice and Bob situation from Figure 9-1.)

Interestingly, it is possible to make Dynamo-style quorums linearizable at the cost of reduced performance: a reader must perform read repair (see “Read repair and antientropy” on page 178) synchronously, before returning results to the application [23], and a writer must read the latest state of a quorum of nodes before sending its writes [24, 25]. However, Riak does not perform synchronous read repair due to the performance penalty [26]. Cassandra does wait for read repair to complete on quorum reads [27], but it loses linearizability if there are multiple concurrent writes to the same key, due to its use of last-write-wins conflict resolution.

Moreover, only linearizable read and write operations can be implemented in this way; a linearizable compare-and-set operation cannot, because it requires a consensus algorithm [28].

In summary, it is safest to assume that a leaderless system with Dynamo-style replication does not provide linearizability.

And a bit more explaination about Linearizability vs Serializability:

enter image description here

127

answered Oct 19 '22 11:10

Anurag Sharma

EDIT: In hindsight this isn't the best explanation. I recommend reading Anurag's answer below which is much more concise.

Since normal Cassandra operations don't observe existing state that it is changing, quorum consistency alone is not considered 'Linearizable'.

For example, if you were to to adjust the balance of a bank account, you would need to know the current balance in order to adjust it. Consider a client that executes the following operations:

A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' (subtract 1$ from balance)

The problem is, from the perspective of Cassandra, the operation B doesn't effectively 'see' A since it's not considering the existing state of the data or any other operations that may be occurring for that matter. Another client could be updating balance for the same account during the submission of B.

Lightweight Transactions in Cassandra 2.0 describes how lightweight transactions provide 'linearizable consistency' by providing constructs that ensure that operations are performed in sequence for a given partition and are not interrupted by others. So instead of my previous example, you can now do:

A. SELECT balance FROM account WHERE id='x' (assume this returns 5.12)
B. UPDATE account SET balance=4.12 WHERE id='x' IF balance=5.12

The use of IF balance=5.12 instructs Cassandra to begin a lightweight transaction, which uses a paxos consesus protocol for leadership election and to ensure operations are applied sequentially. If the state of balance does not meet the condition, the update will not be applied (indicated in a successful response with a was_applied boolean column). If C* is not able to achieve this within some timeout (due to contention or some other factors), the operation will fail, will not be applied, and the client will be surfaced a timeout.

answered Oct 19 '22 11:10

Andy Tolbert

Related questions
                            
                                MySQL database data type
                            
                                How to migrate the database from Filemaker to Mysql?
                            
                                The CommandText property has not been properly initialized (MySqlDataReader)
                            
                                Should every MySQL table have an auto-incremented primary key?
                            
                                Interesting tree/hierarchical data structure problem
                            
                                Connect to Sharepoint Database through PHP
                            
                                Draft / Live Content System Database Design
                            
                                What is aspnet_Users and aspnet_Membership? Which one should I use? What is the difference?
                            
                                Counting the number of rows with a value greater than or equal to a value from another column in SQL
                            
                                SQLite define output field separator on a single line command
                            
                                When adding a foreign key constraint, which direction is best practice?
                            
                                How to get values of a column without knowing the column type
                            
                                To get MYSQL query execution time in Query
                            
                                Best way to store Time series data with heavy writing and high aggregation. (~1 billion points)
                            
                                Unable to select Database Foo using Ec2 and RDS
                            
                                Rename file in c#.net when its is uploaded through file upload control and save chang name of file in database
                            
                                Are there any functions in MySQL like dense_rank() and row_number() like Oracle?
                            
                                Which kind of DBs calculate rate per minute statistics?
                            
                                What is the maximum length of private and public RSA keys?
                            
                                PostgreSQL query not using INDEX when RLS (Row Level Security) is enabled

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is Cassandra not linearizable when quorum based read and writes are used

Tags:

database

cassandra

Anurag Sharma

People also ask

2 Answers

Anurag Sharma

Andy Tolbert

Recent Activity

Donate For Us