Are there real-world reasons to employ a Binary Search Tree over a Binary Search of semi-contiguous list?

Tags:

I'm watching university lectures on algorithms and it seems so many of them rely almost entirely binary search trees of some particular sort for querying/database/search tasks.

I don't understand this obsession with Binary Search Trees. It seems like in the vast majority of scenarios, a BSP could be replaced with a sorted array in the case of a static data, or a sorted bucketed list if insertions occur dynamically, and then a Binary Search could be employed over them.

With this approach, you get the same algorithmic complexity (for querying at least) as a BST, way better cache coherency, way less memory fragmentation (and less gc allocs depending on what language you're in), and are likely much simpler to write.

The fundamental issue is that BSP are completely memory naïve -- their focus is entirely on O(n) complexity and they ignore the very real performance considerations of memory fragmentation and cache coherency... Am I missing something?

877

asked Jun 19 '21 01:06

Charly

Video Answer

1 Answers

Binary search trees (BST) are not totally equivalent to the proposed data structure. Their asymptotic complexity is better when it comes to both insert and remove sorted values dynamically (assuming they are balanced correctly). For example, when you when to build an index of the top-k values dynamically:

while end_of_stream(stream):
    value <- stream.pop_value()
    tree.insert(value)
    tree.remove_max()

Sorted arrays are not efficient in this case because of the linear-time insertion. The complexity of bucketed lists is not better than plain list asymptotically and also suffer from a linear-time search. One can note that a heap can be used in this case, and in fact it is probably better to use a heap here, although they are not always interchangeable.

That being said, your are right : BST are slow, cause a lot of cache miss and fragmentation, etc. Thus, they are often replaced by more compact variants like B-trees. B-tree uses a sorted array index to reduce the amount of node jumps and make the data-structure much more compact. They can be mixed with some 4-byte pointer optimizations to make them even more compact. B-trees are to BST what bucketed linked-lists are to plain linked-lists. B-trees are very good for building dynamic database index of huge datasets stored on a slow storage device (because of the size): they enable applications to fetch values associated to a key using very few storage-device lookups (which as very slow on HDD for example). Another example of real-world use-case is interval-trees.

Note that memory fragmentation can be reduced using compaction methods. For BSTs/B-trees, one can reorder the root nodes like in a heap. However, compaction is not always easy to apply, especially on native languages with pointers like in C/C++ although some very clever methods exists to do so.

Keep in mind that B-trees shine only on big datasets (especially the ones that do not fit in cache). On relatively small ones, using just plain arrays or even sorted array is often a very good solution.

107

answered Sep 28 '22 21:09

Jérôme Richard

Related questions
                            
                                A separate loop slows down an independent earlier loop?
                            
                                Why aren't std::count and std::find optimised to use memchr?
                            
                                PyPy: Severe performance penalty when using None in a list with integers
                            
                                Swift 4: Are Strings reference counted & how to get that count
                            
                                performance issue on Spring Data Mongodb
                            
                                Why is += (addition assignment, plus equal) so slow in node? [duplicate]
                            
                                How to find out what implicit(s) are used in my scala code
                            
                                C++ lambdas as class methods
                            
                                Puzzling performance/output behavior with rank-2 polymorphism in Haskell
                            
                                How to efficiently partial argsort Pandas dataframe across columns
                            
                                MySql 8.0.11 spatial queries slow by a factor of 100000
                            
                                Why do memory access times increase when far over CPU cache sizes
                            
                                Racket streams slower than custom streams?
                            
                                How to identify duplicated ordered pairs efficiently
                            
                                Are operators faster than functions?
                            
                                Client timeout exceeded while awaiting headers
                            
                                How to avoid poor performance of pandas mean() with datetime columns
                            
                                Copying 64 bytes of memory with NT stores to one full cache line vs. 2 consecutive partial cache lines
                            
                                Reasons for differences in memory consumption and performances of np.zeros and np.full
                            
                                What's the computational complexity of .iloc() in pandas dataframes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Are there real-world reasons to employ a Binary Search Tree over a Binary Search of semi-contiguous list?

Tags:

performance

memory-management

binary-search

binary-search-tree

Charly

People also ask

Video Answer

1 Answers

Jérôme Richard

Recent Activity

Donate For Us