Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cassandra data file name convention

Tags:

cassandra

Looking at the data folder, I noticed the following files for a CF. Few questions - what are each file for? what are the 1 ... 6 for? which of the 1 to 6 contains the final (all) data of the CF?

<cf name>-g-1-Compacted
<cf name>-g-1-Data.db
<cf name>-g-1-Filter.db
<cf name>-g-1-Index.db
<cf name>-g-1-Statistics.db

...

<cf name>-g-6-Compacted
<cf name>-g-6-Data.db
<cf name>-g-6-Filter.db
<cf name>-g-6-Index.db
<cf name>-g-6-Statistics.db
like image 487
tom Avatar asked Feb 02 '23 09:02

tom


1 Answers

These files are the SSTables and metadata related to the SSTables. Here is a brief description of each file (lifted from the Cassandra source: io/sstable/Component.java)

  • Data.db: the base data for an sstable
  • Index.db: index of the row keys with pointers to their positions in the data file
  • Filter.db: serialized bloom filter for the row keys in the sstable
  • Statistics.db: statistical metadata about the content of the sstable
  • Bitidx.db: a bitmap secondary index: many of these may exist per sstable
  • Compacted: 0-length file that is created when an sstable is ready to be deleted

SSTables with the *-Compacted are marked for deletion. These files will be cleaned up asynchronously when the JVM performs a GC or Cassandra detects that the system is low on disk space.

The number indicates the generation of an sstable (larger ones are newer). As to which one has all the data, under normal conditions you data can be spread out across multiple SSTables and in memory. You can use nodetool to flush a column family and then run a major compaction to generate one file that has all the data for that SSTable (assuming you don't write anything else to that column family).

like image 85
psanford Avatar answered Mar 08 '23 01:03

psanford