Cassandra LeveledCompactionStrategy and high SSTable number per read

Tags:

cassandra-2.0

We are using cassandra 2.0.17 and we have a table with 50% selects, 40% of updates and 10% of inserts (no deletes).

To have high read performance for such table we found that it is suggested to use LeveledCompactionStrategy (it is supposed to guarantee that 99% of reads will be fulfilled from single SSTable). Every day when I run nodetool cfhistograms i see more and more SSTtables per read. First day we had 1, than we had 1,2,3 ...
and this morning I am seeing this:

ubuntu@ip:~$ nodetool cfhistograms prodb groups | head -n 20                                                                                                                                
prodb/groups histograms

SSTables per Read
1 sstables: 27007
2 sstables: 97694
3 sstables: 95239
4 sstables: 3928
5 sstables: 14
6 sstables: 0
7 sstables: 19

The describe groups returns this:

CREATE TABLE groups (
  ...
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=172800 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Is it normal? In such case we loose the advantage of using LeveledCompaction which as described in the documentation should guarantee 99% of reads from single sstable.

776

asked Oct 05 '16 08:10

Jakub Troszok

1 Answers

It does depend on the usecase - but as a rule of thumb I normally look at LCS for 90% read to 10% write ratio. From your description you're looking at 50/50 at best.

The additional compaction demands placed by LCS makes it pretty io hungry. It's highly likely that compaction is backed up and your levels are not balanced. The easiest way to tell is to run nodetool cfstats for the table in question.

You're looking for the line:

SSTables in each level: [2042/4, 10, 119/100, 232, 0, 0, 0, 0, 0]

The numbers in the square brackets shows how many sstables are in each level. [L0, L1, L2 ...]. The number after the slash is the ideal level. As a rule of thumb L1 should be 10, L2 100, L3 1000 etc.

New sstables go in at L0 and then gradually move up. You can see the above example is in a really bad state. We've still got 2000 sstables to process more than exists in all other levels. The performance here will be massively worse than if I'd just used STCS.

Nodetool cfstats makes it pretty easy to measure if LCS is keeping up with your usecase. Just dump out the above every 15 minutes throughout the day. Any time your levels are unbalanced the read performance will suffer. If it's constantly behind you probably want to switch to STCS. If it spikes for say 10 minutes when you data load but the rest of the day is fine - then you may decide to live with it. If it never goes out of balance - stick with LCS - it's totally working for you.

As a side note - 2.1 allows L0 to carry out STCS style merging which will help in the situation where you have a temporary spike. If you're in the ten minute scenario above - it's almost certainly worth an upgrade.

answered Nov 17 '22 03:11

Nom de plume

Related questions
                            
                                Python cassandra driver: Invalid or unsupported protocol version: 4
                            
                                Cassandra GUI for windows clients [closed]
                            
                                Cassandra or mysql 5 ? Which will be good for future?
                            
                                Bloomfilter and Cassandra = Why used and why hashed several times?
                            
                                Cassandra is much slower than Mysql for simple operations?
                            
                                Cassandra "nodetool status" fails with "Credentials required" after enabling remote JMX
                            
                                Creating column family or table in Cassandra while working Datastax API(which uses new Binary protocol)
                            
                                Cassandra: How to identify and list the nodes that contain a particular row (replica)?
                            
                                saveTocassandra could not find implicit value for parameter rwf
                            
                                How to list all users in the Cassandra shell?
                            
                                Cassandra datastore client in Go language
                            
                                Cassandra SSTables and Compaction
                            
                                OperationTimedOut: errors={}, last_host=127.0.0.1
                            
                                JanusGraph + Cassandra (Generic questions)
                            
                                Efficient and scalable storage for JSON data with NoSQL databases
                            
                                Cassandra as distributed cached data store
                            
                                What do nested parenthesis indicate in a PRIMARY KEY definition
                            
                                How to define dynamic column families in cassandra
                            
                                Round Down Double in Spark
                            
                                Understanding the Token Function in Cassandra

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With