Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra Compaction vs Repair vs Cleanup

After posting a question and reading this and that articles, I still do not understand the relations between those three operations-

  • Cassandra compaction tasks
  • nodetool repair
  • nodetool cleanup

Is repair task can be processed while compaction task is running, or cleanup while compaction task is running? Is cleanup is a operation that need to be executed weekly as repair? Why repair operation need to be executed manually and it is not in Cassandra default behavior?

What is the ground rules for healthy cluster maintenance?

like image 420
Reshef Avatar asked Jun 07 '16 16:06

Reshef


1 Answers

A cleanup is a compaction that just removes things outside the nodes token range(s). A repair has a "Validation Compaction" to build a merkle tree to compare with the other nodes, so part of nodetool repair will have a compaction.

Is repair task can be processed while compaction task is running, or cleanup while compaction task is running?

There is a shared pool of for the compactions across normal compactions, repairs, cleanups, scrubs etc. This is the concurrent_compactors setting in the cassandra.yaml that defaults to a combination of the number of cores and data directories: https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L572

Is cleanup is a operation that need to be executed weekly as repair?

no, only after topology changes really.

Why repair operation need to be executed manually and it is not in Cassandra default behavior?

Its manual because its requirements can differ a lot on what your data and gc_grace requirements are. https://issues.apache.org/jira/browse/CASSANDRA-10070 is bringing it inside Cassandra though so in the future it will be automatic.

What is the ground rules for healthy cluster maintenance?

I would (opinion) say:

  • Regular backups (depending on requirements, and acceptable data loss this can be anything from weekly/daily to constantly with incremental).
    • This is just as much for "internal" mistakes ("Opps i deleted a customer") as outages. Even with strong multi-dc replication you want some minimum backups.
  • Making sure a Repair completes for all tables that have deletes at least once within the gc_grace time of those tables.
  • Metric and log storage pretty important if you want to be able to debug issues.
like image 99
Chris Lohfink Avatar answered Oct 05 '22 19:10

Chris Lohfink