Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Data Replication problem

I have a 2 node cassandra cluster with a replication factor of 2 and AutoBootStrap=true. Everything is good during startup and both nodes see each other. Let us call these nodes A and B.

  1. Add a set of keys and columns (lets call this set K1) to cassandra through node A.
  2. Connect to node A and read back set K1. Same on Node B. Success - Good
  3. Kill Cassandra process on Node B.
  4. Add set K2 through A.
  5. Connect to node A and read set K2. Good
  6. Restart Cassandra process on Node B.
  7. Try to read all keys from B... set K1 present, set K2 MISSING. (Even after 30 minutes)
  8. Add K3 to A/B.
  9. Read all keys from A - returns set K1, K2, K3
  10. Read all keys from B - returns set K1, K3.

B never syncs set K2... (Its been more than 12 hours) Why does node B not see set K2... anyone has any idea?


Added Info :

Ok... this was the problem. The read_consistency_level was set to 1 by default.

So when we ask node B for set K2, and it doesnt have it (when it is supposed to because of the replication factor = 2), it immediately returns with a 'Not found' error.

However, if we use read consistency to be QUORUM or ALL, then B is forced to ask A, which then returns the correct value and B syncs up that key (saves it locally).

This leads to another problem - This means that when node B comes up, it is not syncing all the data from Node A, even after a long time. Now if node A goes down, how can we access that unsynced data? (I just tested that we cant)

I guess there must be a way to force syncing the data. I see the INFO in the terminal output that a hinted handoff of 15 rows from A to B occured when B came up, but B does not have those rows locally (because we still cant read it from B with consistency level ONE). Whats going on here?

like image 840
Rajan Avatar asked Sep 30 '10 03:09

Rajan


1 Answers

There are 3 ways cassandra syncs updates that happened while a node was down:

  1. hinted handoff. requires that failure detector on A recognize that B is down before you write K2. See http://wiki.apache.org/cassandra/HintedHandoff
  2. read repair. requires that B be up when K2 is requested for the repair to happen. See http://wiki.apache.org/cassandra/ReadRepair
  3. anti-entropy repair. requires invoking manually ("nodetool repair"). see http://wiki.apache.org/cassandra/AntiEntropy
like image 115
jbellis Avatar answered Oct 16 '22 19:10

jbellis