I have a program intensively read and write (same amount of read and write, for write, 4/5 update and 1/5 insert). Is SizedTired compaction better than Leveled one?
Also most of data have TTL 7 days and others are 1 day. In this case, is Time Window strategy preferred?
Compaction Overview Compaction merges several SSTables into new SSTable(s) which contain(s) only the live data from the input SSTables. Merging several sorted files to get a sorted result is an efficient process, and this is the main reason why SSTables are kept sorted.
Leveled compaction creates sstables of a fixed, relatively small size (5MB by default in Cassandra's implementation), that are grouped into "levels." Within each level, sstables are guaranteed to be non-overlapping. Each level is ten times as large as the previous.
Cassandra Compaction is a process of reconciling various copies of data spread across distinct SSTables. Cassandra performs compaction of SSTables as a background activity. Cassandra has to maintain fewer SSTables and fewer copies of each data row due to compactions improving its read performance.
It then goes on to cover the different compactions strategies: Size Tiered Compaction Strategy (STCS), Leveled Compaction Strategy (LCS), Time-Window Compaction Strategy (TWCS), and Incremental Compaction Strategy (ICS).
Cassandra supports different compaction strategies, which control how which SSTables are chosen for compaction, and how the compacted rows are sorted into new SSTables. Each strategy has its own strengths. The sections that follow explain each of Cassandra's compaction strategies.
In this post, we saw that although the Leveled Compaction Strategy solves the serious space-amplification problem of the Size-Tiered Compaction Strategy, it introduces a new problem of write-amplification.
To achieve this a compaction strategy instance per data directory is run in addition to the compaction strategy instances containing repaired/unrepaired data, this means that if you have 4 data directories there will be 8 compaction strategy instances running. This has a few more benefits than just avoiding data getting undeleted:
Timewindow isn't a good fit since you have updates which make it less ideal. Sizetier performs the best with the cost of more volume usage. Check the table for compaction algorithm selection here: https://www.scylladb.com/webinar/on-demand-webinar-best-practices-for-data-modeling/
Usually STCS is the best default
LeveledCompactionStrategy with updates like that best bet especially with mixed reads like that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With