I am trying to find the total physical size occupied by cassandra keyspace.
I have a msg generator which dumps lot of messages to cassandra . I want to find out the total physical size of messages in cassandra Table.
When I do du -h /mnt/data/keyspace
linux says only 12kb. I am sure that the data size is much greater than that. The rest of the data must either be in memtables or should be in compaction.
How do I find the total space occupied in cassandra for that keyspace?
I tried the
nodetool cfstats <keyspace>
But it gives me only for that particular node. And also the bytes are present in memtable . I actually want the total size of keyspaces that are actually written to disk across all nodes in the cluster . Is there any command to find this ?
Thanks for the help.
SStables are immutable -- once a memtable is flushed to disk, it remains unchanced until it is deleted (expired) or compacted. Compaction is the process of combining sstables together. This is important when your workload is update heavy and you may have several instances of a CQL row stored in your SSTables (see sstables per read in nodetool cfhistograms
). When you go to read that row, you may have to scan across multiple sstables to find the latest version of the data (in c* last write wins). When we compact, we may take up additional space on disk (especially size tiered compaction which may take up to--this is a theoretical maximum--50% of your data size when compacting) so it is important to keep free disk space. However, compaction will not take data away from your keyspace directory. This is not where your data is.
You're right in your suspicion that data that has not yet been flushed to disk must be sitting in memtables. This data will make it to disk as soon as your commitlog fills up (default 1gb in 2.0 or 8gb in 2.1) or as soon as your memtables get too big -- memtable_total_space_in_mb.
If you want to see your data in sstables, you can flush it manually:
nodetool flush
and your memtables will be dropped into your KS directory in the form of SSTables. Or just be patient and wait until you hit either the commitlog or memtable thresholds.
Yes, your memtable data is also stored in the commitlog. If your machine looses power, etc, the data that has been written is still persisted to disk and the commit-log data will get replayed on startup!
I use nodetool status <keyspace>
. The load column value is roughly the same as the value I get using df -h
(my cassandra installations are on different partitions than the system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With