I'm learning about fractal tree indices such as that found in TokuDB. I am fascinated with the strategy it uses to make writes fast by writing to CPU cache most of the time and only rarely writing out to slower RAM memory. However, a fractal tree index does eventually have to do big writes out to RAM and then giant writes out to disk and then utterly huge writes completely on disk. It is here where I get confused. Can the fractal tree index do this efficiently? More efficiently, say, than a B-tree can update the disk in a worst-case-scenario update? Also, what effect does a giant, on-disk rewrite have upon lookup-time of that data? And, vise versa, what effect does doing several look-ups on that data have on the process of the giant rewrite?
As context for answering this, you should know:
2^largeNumber
) and you write them to a single array (of size 2^(largeNumber+1)
) which is sorted.I suggest you watch my video at http://www.youtube.com/watch?v=88NaRUdoWZM which may give you a better understanding of how Fractal Tree Indexes work. When the indexes do not fit in main memory, a fractal tree index is able to buffer large groups of messages which slowly push down the tree as the buffers overflow. When they eventually make it to a leaf node there is a single IO to retrieve the leaf and apply all the messages. Fractal Tree Indexes do significantly less write IO as they aggregate many operations across a single IO and writes are highly compressed. Read IO is also significantly lessened as it is reading highly compressed data.
I'm not sure if this fully answers your questions, but hopefully it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With