Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Replicas Down during nodetool repair?

I am developing an automated script for nodetool repair which would execute ever weekend on all the 6 Cassandra nodes. We have 3 in DC1 and 3 in DC2. Just want to understand worst case scenario. What would happens if connectivity between DC1 and DC2 is lost or couple of replica goes down before or during a nodetool repair. It could be a network issue, an network upgrade(which usually happens on weekends),or something else. I understand that nodetool repair computes a Merkle tree for each range of data on that node, and compares it with the versions on other replicas. So if their is no connectivity between replicas how would a nodetool repair behave ? Will it really repair the nodes. Do i have to rerun node tool repair after all nodes are up and connectivity is restored. Will their be any side effects of this event ? I goggled about it but couldn't find much details. Any insight would be helpful.

Thanks.

like image 502
Sachin Bhansali Avatar asked Nov 03 '22 19:11

Sachin Bhansali


2 Answers

Let's say you are using vnodes, which by default means that each node has 256 ranges, but the idea is the same.

If the network problem happens after nodetool repair already started you will see in the logs that some ranges where successfully repaired and other don't. The error will say that the range repair failed because node "192.168.1.1 is dead" something like that.

If the network error happens before nodetool repair starts all the ranges will fail with the same error.

In both cases you will need to run another nodetool repair after the network problem is solved.

I don't know the amount of data you have in those 6 nodes, but in my experience if the cluster can handle it it is better to run nodetool repair once a week in a different day of the week. For instance you can repair node 1 on Sunday, node 2 on Monday and so on. If you have a small amount of data or the adds/updates during a day are not too many you can even run a repair once a day. When you have an already repaired cluster and you run nodetool repair more often it will take much less time to finish, but again if you have too much data in it it may not be possible.

Regarding the side effects you can only note a difference in the data if you use consistency level 1, if it happens that you run a query against the "unrepaired" node the data will be different than the one on the "repaired" nodes. You can solve this by increasing the consistency level to 2 for instance, then again if 2 nodes are "unrepaired" and the query you run is resolved using those 2 nodes you will see a difference again. You have a trade-off here since the best option to avoid this "difference" in the queries is to have the consistency level = replication factor, which brings another problem when 1 of the nodes is down the entire cluster is down and you'll start receiving timeouts on your queries.

Hope it helps!

like image 87
Sergio Ayestarán Avatar answered Nov 13 '22 03:11

Sergio Ayestarán


There are multiple repair options available, you can choose one depending upon your application usage. If you are using DSE Cassandra then I would recommend scheduling OpsCenter repair which does incremental repair by giving duration less than gc_grace_seconds.

Following are different options of doing repair:

  1. Default (None): Will repair all 3 partition ranges: 1 primary and 2 replicas owned by the node on which it was run. Total of 5 nodes will be involved 2 nodes will be fixing 1 partition range, 2 nodes will be fixing 2 partition ranges, 1 node will be fixing 3 partition ranges.
  2. -par: Will do the above operation in parallel.
  3. -pr : Will fix only primary partition range for the node on which it was run. If you are using write consistency of EACH_QUORUM then use -local option as well to reduce cross DC traffic.

I would suggest going with option 3 if you are already live in production to avoid any performance impacts due to repair.

If you want read about repair in more detail please have a look at this here

like image 23
Akshay Avatar answered Nov 13 '22 03:11

Akshay