Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to shrink a Cassandra cluster

Tags:

cassandra

So there is a fair amount of documentation on how to scale up a Cassandra, but is there a good resource on how to "unscale" Cassandra and remove nodes from the cluster? Is it as simple as turning off a node, letting the cluster sync up again, and repeating?

The reason is for a site that expects high spikes of traffic, climbing from the daily few thousand hits to hundreds of thousands over a few days. The site will be "ramped up" before hand, starting up multiple instances of the web server, Cassandra, etc. After the torrent of requests subsides, the goal is to turn off the instances that are not longer used, rather than pay for servers that are just sitting around.

like image 905
mguymon Avatar asked Nov 09 '12 01:11

mguymon


People also ask

How do I free up space on Cassandra?

You can drop or truncate tables. This solution is quite efficient because no tombstones are written. Cassandra just create a snapshot of the table when you run the command. The disk space is released when you clear the snapshot.

How do you scale up Cassandra?

Because it's based on nodes, Cassandra scales horizontally (aka scale-out), using lower commodity hardware. To double your capacity or double your throughput, double the number of nodes. That's all it takes.

How is Cassandra scalable?

Cassandra is scalable and elastic, allowing the addition of new machines to increase throughput without downtime. When a master node shuts down in databases that operate on the master-slave architecture, the database can't process new writes until a new master is appointed.

How long can a Cassandra node be down?

Important to note, but by default each node can store hints for up to 3 hours. Or Cassandra will take care itself to replicate the data updated, created, deleted during these 4 hours.


2 Answers

If you just shut the nodes down and rebalance cluster, you risk losing some data, that exist only on removed nodes and hasn't replicated yet.

Safe cluster shrink can be easily done with nodetool. At first, run:

nodetool drain

... on the node removed, to stop accepting writes and flush memtables, then:

nodetool decommission

To move node's data to other nodes, and then shut the node down, and run on some other node:

nodetool removetoken

... to remove the node from the cluster completely. The detailed documentation might be found here: http://wiki.apache.org/cassandra/NodeTool

From my experience, I'd recommend to remove nodes one-by-one, not in batches. It takes more time, but much more safe in case of network outages or hardware failures.

like image 53
Wildfire Avatar answered Oct 03 '22 00:10

Wildfire


When you remove nodes you may have to re-balance the cluster, moving some nodes to a new token. In a planed downscale, you need to:

1 - minimize the number of moves.

2 - if you have to move a node, minimize the amount of transfered data.

There's an article about cluster balancing that may be helpful: Balancing Your Cassandra Cluster

Also, the begining of this video is about add node and remove node operations and best strategies to minimize the cluster impact in each of these operations.

Hopefully, these 2 references will give you enough information to plan your downscale.

like image 38
lstern Avatar answered Oct 02 '22 23:10

lstern