According to several sources, including Wikipedia, the two most used ways of implementing a binary tree are:
The second one is obviously superior in terms of memory usage and locality of reference. However, it can lead to problems if you want to allow insertions and removals from the tree in such a manner that may leave the tree unbalanced. This is because the memory usage of this design is an exponential function of the tree depth.
Suppose that you want to support such insertions and removals. How can you implement the tree such that tree traversal makes good use of CPU caches.
I was thinking about making an object pool for the nodes and allocate them in an array. This way the nodes will be close together -> hence good locality of reference.
But if the size of the node is the same as the size of the cache line, does this make any sense?
If you have a L1 line size of 64 bytes and you access the first member of std::vector<std::uint8_t>(64)
, you will possibly have the entire contents of the vector in your L1 cache. This means that you can access any element very quickly. But what if the size of the element is the same as the cache line size? Since the cache line is likely not to be very different for L1, L2, and L3 caches, there seems to be no way in which locality of reference can help here. Am I wrong? What else can be done?
All in all, the process is more cache-friendly than searching in a regular std::set (where elements are totally scattered throughout memory according to the allocator's whims) and resulting lookup times are consequently much better.
Some more general algorithms, such as Cooley–Tukey FFT, are optimally cache-oblivious under certain choices of parameters. As these algorithms are only optimal in an asymptotic sense (ignoring constant factors), further machine-specific tuning may be required to obtain nearly optimal performance in an absolute sense.
Dynamic binary trees are constructed in a manner very similar to linked lists, except using Nodes in place of Links. A Node has two ways to link to a next item along with one way to link to a previous item.
Cache-friendly data structures fit within a cache line and are aligned to memory such that they make optimal use of cache lines. A common example of a cache-friendly data structure is a two-dimensional matrix. We can set its row dimension to fit in a cache size block for optimal performance.
Unless you are working on research on how to improve binary trees for cache access patterns, I feel this is an XY problem - what is the problem you are trying to solve? Why do you think binary trees are the best algorithm for your problem? What is the expected working set size?
If you are looking for a generic associative storage, there are multiple cache-friendly (other keywords: "cache-efficient", "cache-oblivious") algorithms, such as Judy arrays, for which there is an extensive explanation PDF.
If your working set size is small enough, and you only need ordered set of items, a simple ordered array may be enough, which could lead to another performance benefit - branch prediction.
In the end, to find out what's best for your use-case is to try and measure different approaches.
Use a block allocator.
You have one or maybe a handful of contiguous memory "pools" from which you dole out blocks of a fixed size. It's implemented as a linked list. So allocation is simply
answer = head,
head = head->next,
return answer;
freeing is simply
tofree->next = head;
head = tofree;
If you allow more than one pool of course you need to write code to determine the pool, which adds a bit of complexity, but not much. It's essentially a simple memory allocation system. Since all pool members are close together in memory, you get good cache coherence on small trees. For large trees you'll have to be a bit cleverer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With