Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra nodetool "compactionstats" meaning of displayed values

I cannot find documentation on the "compactionstats":

While using nodetool compactionstats, what do the numerical values on the completed and total columns mean? My column family has a total data size of about 360 GB but my compaction status displays:

pending tasks: 7
compaction type  keyspace   column family   completed      total           unit   progress
Compaction       Test       Message         161257707087   2475323941809   bytes  6.51%

While I see the "completed" increasing slowly (also the progress;-).

But how is this "total" computed? Why is it 2.5 TB when I have only 360 GB of data?

like image 972
Christopher Frank Avatar asked Jan 14 '14 15:01

Christopher Frank


1 Answers

You must have compression on. total is the total number of uncompressed bytes comprising the set of sstables that are being compacted together. If you grep the cassandra log file for lines containing Compacting you will find the sstables that are part of a compaction. If you sum these sizes and multiply by the inverse of your compression ratio for the column family you will get pretty close to the total. By default this can be a bit difficult to verify on a multi-core system because the number of simultaneous compactions defaults to the number of cores.

You can also verify this answer by looking at the code:

AbstractionCompactionIterable - getCompactionInfo() uses the bytesRead and totalBytes fields from that class. totalBytes is final and is computed in the constructor, by summing getLengthInBytes() from each file that is part of the compaction.

The scanners vary, but the length in bytes returned by CompressedRandomAccessReader is the uncompressed size of the file.

like image 93
Martin Serrano Avatar answered Oct 12 '22 12:10

Martin Serrano