How does a Fractal Tree Index perform when moving data on disk?

Question

I'm learning about fractal tree indices such as that found in TokuDB. I am fascinated with the strategy it uses to make writes fast by writing to CPU cache most of the time and only rarely writing out to slower RAM memory. However, a fractal tree index does eventually have to do big writes out to RAM and then giant writes out to disk and then utterly huge writes completely on disk. It is here where I get confused. Can the fractal tree index do this efficiently? More efficiently, say, than a B-tree can update the disk in a worst-case-scenario update? Also, what effect does a giant, on-disk rewrite have upon lookup-time of that data? And, vise versa, what effect does doing several look-ups on that data have on the process of the giant rewrite?

As context for answering this, you should know:

Everything I learned about fractal tree indices I learned in this slide presentation
I don't have a good mental model for how a spinning medium hard drive works.
When I say "giant rewrite", basically what happens is that you have two sorted arrays of the same length (of size 2^largeNumber) and you write them to a single array (of size 2^(largeNumber+1)) which is sorted.

As context for answering this, you should know:

Everything I learned about fractal tree indices I learned in this slide presentation
I don't have a good mental model for how a spinning medium hard drive works.
When I say "giant rewrite", basically what happens is that you have two sorted arrays of the same length (of size 2^largeNumber) and you write them to a single array (of size 2^(largeNumber+1)) which is sorted.

tmcallaghan · Accepted Answer

I suggest you watch my video at http://www.youtube.com/watch?v=88NaRUdoWZM which may give you a better understanding of how Fractal Tree Indexes work. When the indexes do not fit in main memory, a fractal tree index is able to buffer large groups of messages which slowly push down the tree as the buffers overflow. When they eventually make it to a leaf node there is a single IO to retrieve the leaf and apply all the messages. Fractal Tree Indexes do significantly less write IO as they aggregate many operations across a single IO and writes are highly compressed. Read IO is also significantly lessened as it is reading highly compressed data.

I'm not sure if this fully answers your questions, but hopefully it helps.

How does a Fractal Tree Index perform when moving data on disk?

Tags:

indexing

memory

data-structures

JnBrymn

1 Answers

tmcallaghan

Recent Activity

Donate For Us

How does a Fractal Tree Index perform when moving data on disk?

Tags:

indexing

memory

data-structures

JnBrymn

1 Answers

tmcallaghan

Related questions

Recent Activity

Donate For Us