Looking at the data folder, I noticed the following files for a CF. Few questions - what are each file for? what are the 1 ... 6 for? which of the 1 to 6 contains the final (all) data of the CF?
<cf name>-g-1-Compacted
<cf name>-g-1-Data.db
<cf name>-g-1-Filter.db
<cf name>-g-1-Index.db
<cf name>-g-1-Statistics.db
...
<cf name>-g-6-Compacted
<cf name>-g-6-Data.db
<cf name>-g-6-Filter.db
<cf name>-g-6-Index.db
<cf name>-g-6-Statistics.db
These files are the SSTables and metadata related to the SSTables. Here is a brief description of each file (lifted from the Cassandra source: io/sstable/Component.java)
SSTables with the *-Compacted are marked for deletion. These files will be cleaned up asynchronously when the JVM performs a GC or Cassandra detects that the system is low on disk space.
The number indicates the generation of an sstable (larger ones are newer). As to which one has all the data, under normal conditions you data can be spread out across multiple SSTables and in memory. You can use nodetool to flush a column family and then run a major compaction to generate one file that has all the data for that SSTable (assuming you don't write anything else to that column family).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With