Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the total space occupied by a cassandra keyspace?

Tags:

cassandra

I am trying to find the total physical size occupied by cassandra keyspace.

I have a msg generator which dumps lot of messages to cassandra . I want to find out the total physical size of messages in cassandra Table.

When I do du -h /mnt/data/keyspace linux says only 12kb. I am sure that the data size is much greater than that. The rest of the data must either be in memtables or should be in compaction.

How do I find the total space occupied in cassandra for that keyspace?

I tried the

     nodetool cfstats <keyspace>

But it gives me only for that particular node. And also the bytes are present in memtable . I actually want the total size of keyspaces that are actually written to disk across all nodes in the cluster . Is there any command to find this ?

Thanks for the help.

like image 984
Knight71 Avatar asked Apr 28 '15 09:04

Knight71


2 Answers

What is Compaction?

SStables are immutable -- once a memtable is flushed to disk, it remains unchanced until it is deleted (expired) or compacted. Compaction is the process of combining sstables together. This is important when your workload is update heavy and you may have several instances of a CQL row stored in your SSTables (see sstables per read in nodetool cfhistograms). When you go to read that row, you may have to scan across multiple sstables to find the latest version of the data (in c* last write wins). When we compact, we may take up additional space on disk (especially size tiered compaction which may take up to--this is a theoretical maximum--50% of your data size when compacting) so it is important to keep free disk space. However, compaction will not take data away from your keyspace directory. This is not where your data is.

Then where did my data go?

You're right in your suspicion that data that has not yet been flushed to disk must be sitting in memtables. This data will make it to disk as soon as your commitlog fills up (default 1gb in 2.0 or 8gb in 2.1) or as soon as your memtables get too big -- memtable_total_space_in_mb.

If you want to see your data in sstables, you can flush it manually:

nodetool flush

and your memtables will be dropped into your KS directory in the form of SSTables. Or just be patient and wait until you hit either the commitlog or memtable thresholds.

But aren't cassandra writes durable?

Yes, your memtable data is also stored in the commitlog. If your machine looses power, etc, the data that has been written is still persisted to disk and the commit-log data will get replayed on startup!

like image 196
phact Avatar answered Oct 03 '22 05:10

phact


I use nodetool status <keyspace>. The load column value is roughly the same as the value I get using df -h (my cassandra installations are on different partitions than the system.

like image 23
Popinou Avatar answered Oct 03 '22 04:10

Popinou