We are using cassandra 2.0.17 and we have a table with 50% selects, 40% of updates and 10% of inserts (no deletes).
To have high read performance for such table we found that it is suggested to use LeveledCompactionStrategy (it is supposed to guarantee that 99% of reads will be fulfilled from single SSTable). Every day when I run nodetool cfhistograms
i see more and more SSTtables per read. First day we had 1, than we had 1,2,3 ...
and this morning I am seeing this:
ubuntu@ip:~$ nodetool cfhistograms prodb groups | head -n 20
prodb/groups histograms
SSTables per Read
1 sstables: 27007
2 sstables: 97694
3 sstables: 95239
4 sstables: 3928
5 sstables: 14
6 sstables: 0
7 sstables: 19
The describe groups returns this:
CREATE TABLE groups (
...
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.100000 AND
gc_grace_seconds=172800 AND
index_interval=128 AND
read_repair_chance=0.000000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'LeveledCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
Is it normal? In such case we loose the advantage of using LeveledCompaction which as described in the documentation should guarantee 99% of reads from single sstable.
SSTables are the immutable data files that Cassandra uses for persisting data on disk. As SSTables are flushed to disk from memtables or are streamed from other nodes, Cassandra triggers compactions which combine multiple SSTables into one. Once the new SSTable has been written, the old SSTables can be removed.
Sorted Strings Table (SSTable) is a persistent file format used by ScyllaDB, Apache Cassandra, and other NoSQL databases to take the in-memory data stored in memtables, order it for fast access, and store it on disk in a persistent, ordered, immutable set of files.
Cassandra Compaction is a process of reconciling various copies of data spread across distinct SSTables. Cassandra performs compaction of SSTables as a background activity. Cassandra has to maintain fewer SSTables and fewer copies of each data row due to compactions improving its read performance.
Leveled compaction creates sstables of a fixed, relatively small size (5MB by default in Cassandra's implementation), that are grouped into "levels." Within each level, sstables are guaranteed to be non-overlapping. Each level is ten times as large as the previous.
It does depend on the usecase - but as a rule of thumb I normally look at LCS for 90% read to 10% write ratio. From your description you're looking at 50/50 at best.
The additional compaction demands placed by LCS makes it pretty io hungry. It's highly likely that compaction is backed up and your levels are not balanced. The easiest way to tell is to run nodetool cfstats for the table in question.
You're looking for the line:
SSTables in each level: [2042/4, 10, 119/100, 232, 0, 0, 0, 0, 0]
The numbers in the square brackets shows how many sstables are in each level. [L0, L1, L2 ...]. The number after the slash is the ideal level. As a rule of thumb L1 should be 10, L2 100, L3 1000 etc.
New sstables go in at L0 and then gradually move up. You can see the above example is in a really bad state. We've still got 2000 sstables to process more than exists in all other levels. The performance here will be massively worse than if I'd just used STCS.
Nodetool cfstats makes it pretty easy to measure if LCS is keeping up with your usecase. Just dump out the above every 15 minutes throughout the day. Any time your levels are unbalanced the read performance will suffer. If it's constantly behind you probably want to switch to STCS. If it spikes for say 10 minutes when you data load but the rest of the day is fine - then you may decide to live with it. If it never goes out of balance - stick with LCS - it's totally working for you.
As a side note - 2.1 allows L0 to carry out STCS style merging which will help in the situation where you have a temporary spike. If you're in the ten minute scenario above - it's almost certainly worth an upgrade.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With