Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are there real-world reasons to employ a Binary Search Tree over a Binary Search of semi-contiguous list?

I'm watching university lectures on algorithms and it seems so many of them rely almost entirely binary search trees of some particular sort for querying/database/search tasks.

I don't understand this obsession with Binary Search Trees. It seems like in the vast majority of scenarios, a BSP could be replaced with a sorted array in the case of a static data, or a sorted bucketed list if insertions occur dynamically, and then a Binary Search could be employed over them.

With this approach, you get the same algorithmic complexity (for querying at least) as a BST, way better cache coherency, way less memory fragmentation (and less gc allocs depending on what language you're in), and are likely much simpler to write.

The fundamental issue is that BSP are completely memory naïve -- their focus is entirely on O(n) complexity and they ignore the very real performance considerations of memory fragmentation and cache coherency... Am I missing something?

like image 877
Charly Avatar asked Jun 19 '21 01:06

Charly


People also ask

Why do we go for binary search tree instead of binary tree?

When to Use Binary Search Trees. Implementing a binary search tree is useful in any situation where the elements can be compared in a less than / greater than manner. For our example, we'll use alphabetical order as our criteria for whether an element is greater than or less than another element (eg.

Which of the following is an advantage of representing a binary search tree with a contiguous array rather than notes link via pointers?

Explanation: The answer is b because the elements in an array are stored in a contiguous block of memory, so it is easier to access the elements of an array through indexing.

Is binary search better than searching in BST?

Usually, the the binary search on arrays is a little faster because it traverses less memory overall (no pointers need to be stored) and it is more cache coherent in the final stages of the algorithm.

What is a benefit of using a binary search tree data structure?

Advantages of Binary Search Tree: BST is fast in insertion and deletion when balanced. BST is efficient. We can also do range queries – find keys between N and M (N <= M). BST code is simple as compared to other data structures.


Video Answer


1 Answers

Binary search trees (BST) are not totally equivalent to the proposed data structure. Their asymptotic complexity is better when it comes to both insert and remove sorted values dynamically (assuming they are balanced correctly). For example, when you when to build an index of the top-k values dynamically:

while end_of_stream(stream):
    value <- stream.pop_value()
    tree.insert(value)
    tree.remove_max()

Sorted arrays are not efficient in this case because of the linear-time insertion. The complexity of bucketed lists is not better than plain list asymptotically and also suffer from a linear-time search. One can note that a heap can be used in this case, and in fact it is probably better to use a heap here, although they are not always interchangeable.

That being said, your are right : BST are slow, cause a lot of cache miss and fragmentation, etc. Thus, they are often replaced by more compact variants like B-trees. B-tree uses a sorted array index to reduce the amount of node jumps and make the data-structure much more compact. They can be mixed with some 4-byte pointer optimizations to make them even more compact. B-trees are to BST what bucketed linked-lists are to plain linked-lists. B-trees are very good for building dynamic database index of huge datasets stored on a slow storage device (because of the size): they enable applications to fetch values associated to a key using very few storage-device lookups (which as very slow on HDD for example). Another example of real-world use-case is interval-trees.

Note that memory fragmentation can be reduced using compaction methods. For BSTs/B-trees, one can reorder the root nodes like in a heap. However, compaction is not always easy to apply, especially on native languages with pointers like in C/C++ although some very clever methods exists to do so.

Keep in mind that B-trees shine only on big datasets (especially the ones that do not fit in cache). On relatively small ones, using just plain arrays or even sorted array is often a very good solution.

like image 107
Jérôme Richard Avatar answered Sep 28 '22 21:09

Jérôme Richard