Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which compaction strategy to use for both read/write intensive program using scylla db

I have a program intensively read and write (same amount of read and write, for write, 4/5 update and 1/5 insert). Is SizedTired compaction better than Leveled one?

Also most of data have TTL 7 days and others are 1 day. In this case, is Time Window strategy preferred?

like image 370
SilentCanon Avatar asked Aug 09 '19 03:08

SilentCanon


People also ask

What is Scylla compaction?

Compaction Overview Compaction merges several SSTables into new SSTable(s) which contain(s) only the live data from the input SSTables. Merging several sorted files to get a sorted result is an efficient process, and this is the main reason why SSTables are kept sorted.

What is leveled compaction?

Leveled compaction creates sstables of a fixed, relatively small size (5MB by default in Cassandra's implementation), that are grouped into "levels." Within each level, sstables are guaranteed to be non-overlapping. Each level is ten times as large as the previous.

What is compaction in Cassandra?

Cassandra Compaction is a process of reconciling various copies of data spread across distinct SSTables. Cassandra performs compaction of SSTables as a background activity. Cassandra has to maintain fewer SSTables and fewer copies of each data row due to compactions improving its read performance.

What are the different types of compaction strategies?

It then goes on to cover the different compactions strategies: Size Tiered Compaction Strategy (STCS), Leveled Compaction Strategy (LCS), Time-Window Compaction Strategy (TWCS), and Incremental Compaction Strategy (ICS).

What are the different compaction strategies supported by Cassandra?

Cassandra supports different compaction strategies, which control how which SSTables are chosen for compaction, and how the compacted rows are sorted into new SSTables. Each strategy has its own strengths. The sections that follow explain each of Cassandra's compaction strategies.

Does the leveled compaction strategy solve the write-amplification problem?

In this post, we saw that although the Leveled Compaction Strategy solves the serious space-amplification problem of the Size-Tiered Compaction Strategy, it introduces a new problem of write-amplification.

How many compaction strategy instances per DATA directory?

To achieve this a compaction strategy instance per data directory is run in addition to the compaction strategy instances containing repaired/unrepaired data, this means that if you have 4 data directories there will be 8 compaction strategy instances running. This has a few more benefits than just avoiding data getting undeleted:


2 Answers

Timewindow isn't a good fit since you have updates which make it less ideal. Sizetier performs the best with the cost of more volume usage. Check the table for compaction algorithm selection here: https://www.scylladb.com/webinar/on-demand-webinar-best-practices-for-data-modeling/

Usually STCS is the best default

like image 132
dor laor Avatar answered Nov 15 '22 11:11

dor laor


LeveledCompactionStrategy with updates like that best bet especially with mixed reads like that.

like image 27
Chris Lohfink Avatar answered Nov 15 '22 10:11

Chris Lohfink