We have a big Cassandra cluster 18 Servers (on one server near 5T data )
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html - We have added a new nodes following this documentation .
After we have added new servers, we began the process of cleaning data (nodetool cleanup)
In the documentation advise: After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove the keys no longer belonging to those nodes. Wait for cleanup to complete on one node before doing the next)
But cleanup for one server takes near 2 - 3 days in our case. My question is can I start cleaning at once on multiple servers, 2 or 3 ...
Or it may lead to data loss ?
Some more info .
We use cassandra 2.0.13 with vnodes . Also We keep files in blons in cassandra .
Replication factor = 3
You can drop or truncate tables. This solution is quite efficient because no tombstones are written. Cassandra just create a snapshot of the table when you run the command. The disk space is released when you clear the snapshot.
You should run nodetool cleanup whenever you scale-out (expand) your cluster, and new nodes are added to the same DC. The scale out process causes the token ring to get re-distributed. As a result, some of the nodes will have replicas for tokens that they are no longer responsible for (taking up disk space).
Scrub automatically discards broken data and removes any tombstoned rows that have exceeded gc_grace period of the table. If partition key values do not match the column data type, the partition is considered corrupt and the process automatically stops.
You can take a node out of the cluster with nodetool decommission to a live node, or nodetool removenode (to any other machine) to remove a dead one. This will assign the ranges the old node was responsible for to other nodes, and replicate the appropriate data there.
Cleanup doesn't involve any other nodes so it is safe to run in parallel. However, you may want to run on one at once to reduce the performance impact since cleanup may use lots of disk I/O.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With