Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra node almost out of space, but nodetool cleanup is increasing disk use?

Tags:

cassandra

One of our nodes was at 95% disk use and we added another node to the cluster to hopefully rebalance but the disk space didn't drop on the node. I tried doing nodetool cleanup assuming that excess keys were on the node, but the disk space is increasing! Will cleanup actually reduce the size?

like image 402
Constance Eustace Avatar asked Jun 09 '15 22:06

Constance Eustace


People also ask

When should I run Nodetool cleanup?

You should run nodetool cleanup whenever you scale-out (expand) your cluster, and new nodes are added to the same DC. The scale out process causes the token ring to get re-distributed. As a result, some of the nodes will have replicas for tokens that they are no longer responsible for (taking up disk space).

What does Nodetool drain do?

Drains the node. Flushes all memtables from the node to SSTables on disk. DSE stops listening for connections from the client and other nodes.

What does Cassandra Nodetool repair do?

The repair command repairs one or more nodes in a cluster, and provides options for restricting repair to a set of nodes. Anti-entropy node repair performs the following tasks: Ensures that all data on a replica is consistent. Repairs inconsistencies on a node that has been down.

What is Nodetool flush?

Flushes one or more tables from the memtable to SSTables on disk. Flushes one or more tables from the memtable to SSTables on disk. OpsCenter provides a flush option for Flushing tables in Nodes.


1 Answers

Yes it will, but you have to be careful because a compaction is calculated and it generates temporary files and tmp link files that will increase disk space until the cleaned up compacted table is calculated.

So I would go into your data directory and figure out what your keyspace sizes are using

du -h -s *  

Then individually clean up the smaller keyspaces (you can specify a keyspace in the nodetool cleanup command with nodetool cleanup ) until you have some overhead. To get an idea of how much space is being freed, tail the log and cat/grep for cleaned compactions:

tail <system.log location> | grep 'eaned'

I'd recommend you don't try to cleanup a keyspace that is more that half the size of your remaining disk space. Hopefully that is possible.

If you don't have enough space you'll have to shut down the node, attach a bigger disk, copy the data files over to the bigger disk, repoint the yaml to the new data directories, then restart up. This is useful for things like SSDs that are expensive and small, but the main spinning disks are cheaper and bigger.

like image 172
Kip Diskin Avatar answered Sep 19 '22 19:09

Kip Diskin