Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement a cache friendly dynamic binary tree?

According to several sources, including Wikipedia, the two most used ways of implementing a binary tree are:

  1. Nodes and pointers (or references) where each node explicitly holds its children.
  2. Array where the position of child nodes is given implicitly by the index of its parent.

The second one is obviously superior in terms of memory usage and locality of reference. However, it can lead to problems if you want to allow insertions and removals from the tree in such a manner that may leave the tree unbalanced. This is because the memory usage of this design is an exponential function of the tree depth.

Suppose that you want to support such insertions and removals. How can you implement the tree such that tree traversal makes good use of CPU caches.

I was thinking about making an object pool for the nodes and allocate them in an array. This way the nodes will be close together -> hence good locality of reference.

But if the size of the node is the same as the size of the cache line, does this make any sense?

If you have a L1 line size of 64 bytes and you access the first member of std::vector<std::uint8_t>(64), you will possibly have the entire contents of the vector in your L1 cache. This means that you can access any element very quickly. But what if the size of the element is the same as the cache line size? Since the cache line is likely not to be very different for L1, L2, and L3 caches, there seems to be no way in which locality of reference can help here. Am I wrong? What else can be done?

like image 820
Martin Drozdik Avatar asked Jan 27 '17 22:01

Martin Drozdik


People also ask

Is binary search cache-friendly?

All in all, the process is more cache-friendly than searching in a regular std::set (where elements are totally scattered throughout memory according to the allocator's whims) and resulting lookup times are consequently much better.

Which algorithm is cache-friendly?

Some more general algorithms, such as Cooley–Tukey FFT, are optimally cache-oblivious under certain choices of parameters. As these algorithms are only optimal in an asymptotic sense (ignoring constant factors), further machine-specific tuning may be required to obtain nearly optimal performance in an absolute sense.

What is dynamic binary tree?

Dynamic binary trees are constructed in a manner very similar to linked lists, except using Nodes in place of Links. A Node has two ways to link to a next item along with one way to link to a previous item.

What is cache friendliness in array?

Cache-friendly data structures fit within a cache line and are aligned to memory such that they make optimal use of cache lines. A common example of a cache-friendly data structure is a two-dimensional matrix. We can set its row dimension to fit in a cache size block for optimal performance.


2 Answers

Unless you are working on research on how to improve binary trees for cache access patterns, I feel this is an XY problem - what is the problem you are trying to solve? Why do you think binary trees are the best algorithm for your problem? What is the expected working set size?

If you are looking for a generic associative storage, there are multiple cache-friendly (other keywords: "cache-efficient", "cache-oblivious") algorithms, such as Judy arrays, for which there is an extensive explanation PDF.

If your working set size is small enough, and you only need ordered set of items, a simple ordered array may be enough, which could lead to another performance benefit - branch prediction.

In the end, to find out what's best for your use-case is to try and measure different approaches.

like image 164
Michal Kottman Avatar answered Sep 22 '22 11:09

Michal Kottman


Use a block allocator.

You have one or maybe a handful of contiguous memory "pools" from which you dole out blocks of a fixed size. It's implemented as a linked list. So allocation is simply

answer = head, 
head = head->next, 
return answer; 

freeing is simply

tofree->next = head;
head = tofree;

If you allow more than one pool of course you need to write code to determine the pool, which adds a bit of complexity, but not much. It's essentially a simple memory allocation system. Since all pool members are close together in memory, you get good cache coherence on small trees. For large trees you'll have to be a bit cleverer.

like image 44
Malcolm McLean Avatar answered Sep 26 '22 11:09

Malcolm McLean