I am trying to understand how the Leveled Compaction Strategy in Cassandra works that guarantees 90% of all reads will be satisfied from a single sstable.
From DataStax Doc:
new sstables are added to the first level, L0, and immediately compacted with the sstables in L1. When L1 fills up, extra sstables are promoted to L2. Subsequent sstables generated in L1 will be compacted with the sstables in L2 with which they overlap.
LeveledCompactionStrategy (LCS) in Cassandra implements the internals of LevelDB. You can check the exact implementation details in LevelDB implementation doc.
In order to give you a simple explanation take into account the following points:
In bold are the relevant details that justify the 90% reads from the same file (SSTable). Let's do the math together and everything will become clearer.
Imagine you have keys A,B,C,D,E in L0, and each keys takes 1MB of data.
Next we insert key F. Because level 0 is filled a compaction will create a file with [A,B,C,D,E] in level 1, and F will remain in level 0.
That's ~83% of data in 1 file in L1.
Next we insert G,H,I,J and K. So L0 fills up again, L1 gets a new sstable with [I,G,H,I,J].
By now we have K in L0, [A,B,C,D,E] and [F,G,H,I,J] in L1
And that's ~90% of data in L1.
If we continue inserting keys we will get around the same behavior so, that's why you get 90% of reads served from roughly the same file/SSTable.
A more in depth and detailed (what happens with updates and tombstones) info is given in this paragraph on the link I mentioned (the sizes for compaction election are different because they are LevelDB defaults, not C*s):
When the size of level L exceeds its limit, we compact it in a background thread. The compaction picks a file from level L and all overlapping files from the next level L+1. Note that if a level-L file overlaps only part of a level-(L+1) file, the entire file at level-(L+1) is used as an input to the compaction and will be discarded after the compaction. Aside: because level-0 is special (files in it may overlap each other), we treat compactions from level-0 to level-1 specially: a level-0 compaction may pick more than one level-0 file in case some of these files overlap each other.
A compaction merges the contents of the picked files to produce a sequence of level-(L+1) files. We switch to producing a new level-(L+1) file after the current output file has reached the target file size (2MB). We also switch to a new output file when the key range of the current output file has grown enough to overlap more then ten level-(L+2) files. This last rule ensures that a later compaction of a level-(L+1) file will not pick up too much data from level-(L+2).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With