Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does LevelDB needs more than two levels?

Tags:

leveldb

okvs

I think only two levels(level-0 and level-1) is ok, why does LevelDB need level-2, level-3, and more?

like image 562
ideawu Avatar asked Jan 13 '13 15:01

ideawu


2 Answers

I think it is mostly to do with easy and quick merging of levels.

In Leveldb, level-(i+1) has approx. 10 times the data compared to level-i. This is more analogous to a multi-level cache structure where in if the database has 1000 records between keys x1 to x2, then 10 of the most frequently accessed ones in that range would be in level-1 and 100 in the same range would be in level-2 and rest in level-3 (this is not exact but just to give an intuitive idea of levels). In this set up, to merge a file in level-i we need to look at at most 10 files in level-(i+1) and it can all be brought into memory, a quick merge done and written back. These results in reading relatively small chunks of data for each compaction/merging operation.

On the other hand if you had just 2 levels, the key range in one level-0 file could potentially match 1000's of files in level-1 and all of them need to be opened up for merging which is going to be pretty slow. Note that an important assumption here is we have fixed size files (say 2MB). With variable length files in level-1, your idea could still work and I think a variant of that is used in systems like HBase and Cassandra.

Now if you are concern is about look up delay with many levels, again this is like a multi-level cache structure, most recently written data would be in higher levels to help with typical locality of reference.

like image 189
jayadev Avatar answered Oct 14 '22 17:10

jayadev


I'll point you in the direction of some articles on LevelDB and it's underlying storage structure.

So in the documentation for LevelDB it discusses merges among levels.

These merges have the effect of gradually migrating new updates from the young level to the largest level using only bulk reads and writes (i.e., minimizing expensive seeks).

LevelDB is similar in structure to Log Structured Merge Trees. The paper discusses the different levels if you're interested in the analysis of it. If you can get through the mathematics it seems to be your best bet to understanding the data structure.

A much easier to read analysis of levelDB talks about the datastore's relation to LSM Trees but in terms of your questions about the levels all it says is:

Finally, having hundreds of on-disk SSTables is also not a great idea, hence periodically we will run a process to merge the on-disk SSTables.

Probably the LevelDB documentation provides the best answer: (maximizing the size of the writes and reads, since LevelDB is on-disk(slow seek) data storage).

Good Luck!

like image 38
Jesse Avatar answered Oct 14 '22 18:10

Jesse