I have a Cassandra datacenter which I'd like to run a full repair on. The datacenter is used for analytics/batch processing and I'm willing to sacrifice latencies to speed up a full repair (nodetool repair
). Writes to the datacenter is moderate.
What are my options to make the full repair faster? Some ideas:
streamthroughput
?compactionthroughput
temporarily. Not sure I'd want to that, though...Additional information:
cassandra.yaml
for this.Incremental repair consumes less time and resources because it skips SSTables that are already marked as repaired. Incremental repair works equally well with any compaction scheme — Size-Tiered Compaction (STCS), Date-Tiered Compaction(DTCS), Time-Window Compaction(TWCS), or Leveled Compaction (LCS).
Repair synchronizes the data between nodes by comparing their respective datasets for their common token ranges, and streaming the differences for any out of sync sections between the nodes. It compares the data with merkle trees, which are a hierarchy of hashes.
Repairs one or more tables. The repair command repairs one or more nodes in a cluster, and provides options for restricting repair to a set of nodes, see Repairing nodes. Performing an anti-entropy node repair on a regular basis is important, especially in an environment that deletes data frequently.
Full repairs are run sequentially by default. The state and differences of the nodes' datasets are stored in binary trees. Recreating these is the main factor here. According to this datastax blog entry, "Every time a repair is carried out, the tree has to be calculated, each node that is involved in the repair has to construct its merkle tree from all the sstables it stores making the calculation very expensive."
The only way I see to significantly increase the speed of a full repair is to run it in parallel or repair subrange by subrange. Your tag implies that you run Cassandra 2.0.
1) Parallel full repair
nodetool repair -par, or --parallel, means carry out a parallel repair.
According to the nodetool documentation for Cassandra 2.0
Unlike sequential repair (described above), parallel repair constructs the Merkle tables for all nodes at the same time. Therefore, no snapshots are required (or generated). Use a parallel repair to complete the repair quickly or when you have operational downtime that allows the resources to be completely consumed during the repair.
2) Subrange repair nodetool accepts start and end token parameters like so
nodetool repair -st (start token) -et (end token) $keyspace $columnfamily
For simplicity sake, check out this python script that calculates tokens for you and executes the range repairs: https://github.com/BrianGallew/cassandra_range_repair
Let me point out two alternative options:
A) Jeff Jirsa pointed to incremental repairs.
These are available starting with Cassandra 2.1. You will need to perform certain migration steps before you can use nodetool like this:
nodetool repair -inc, or --incremental means do an incremental repair.
B) OpsCenter Repair Service
For the couple of clusters at my company itembase.com, we use the repair service in DataStax OpsCenter which is executing and managing small range repairs as a service.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With