Ok, so this is something that's always bothered me. The tree data structures I know of are: <ul> <li>Unbalanced binary trees</li> <li>AVL trees</li> <li>Red-black trees</li> <li>2-3 trees</li> <li>B-trees</li> <li>B*-trees</li> <li>Tries</li> <li>Heaps</li> </ul> How do I determine what kind of tree is the best tool for the job? Obviously heaps are canonically used to form priority queues. But the rest of them just seem to be different ways of doing the same thing. Is there any way to choose the best one for the job?

Let’s pick them off one by one, shall we? <blockquote> <ul> <li>Unbalanced binary trees</li> </ul> </blockquote> For search tasks, never. Basically, their performance characteristics will be completely unpredictable and the overhead of balancing a tree won’t be so big as to make unbalanced trees a viable alternative. Apart from that, unbalanced binary trees of course have other uses, but not as search trees. <blockquote> <ul> <li>AVL trees</li> </ul> </blockquote> They are easy to develop but their performance is generally surpassed by other balancing strategies because balancing them is comparatively time-intensive. Wikipedia claims that they perform better in lookup-intensive scenarios because their height is slightly less in the worst case. <blockquote> <ul> <li>Red-black trees</li> </ul> </blockquote> These are used inside most of C++’ <code>std::map</code> implemenations and probably in a few other standard libraries as well. However, there’s good evidence that they are actually worse than B(+) trees in every scenario due to caching behaviour of modern CPUs. Historically, when caching wasn’t as important (or as good), they surpassed B trees when used in main memory. <blockquote> <ul> <li>2-3 trees</li> <li>B-trees</li> <li>B*-trees</li> </ul> </blockquote> These require the most careful consideration of all the trees, since the different constants used are basically “magical” constans which relate in weird and sometimes unpredictable way to the underlying hardware architecture. For example, the optimal number of child nodes per level can depend on the size of a memory page or cache line. I know of no good, general rule to distinguish between them. <blockquote> <ul> <li>Tries</li> </ul> </blockquote> Completely different. Tries are also search trees, but for text retrieval of substrings in a corpus. A trie is an uncompressed prefix tree (i.e. a tree in which the paths from root to leaf nodes correspond to all the prefixes of a given string). Tries should be compared to, and offset against, suffix trees, suffix arrays and q-gram indices – not so much against other search trees because the data that they search is different: instead of discrete words in a corpus, the latter index structures allow a factor search. <blockquote> <ul> <li>Heaps</li> </ul> </blockquote> As you’ve already said, they are not search trees at all.

How do I determine which kind of tree data structure to choose?

Tags:

data-structures

tree

Ok, so this is something that's always bothered me. The tree data structures I know of are:

Unbalanced binary trees
AVL trees
Red-black trees
2-3 trees
B-trees
B*-trees
Tries
Heaps

How do I determine what kind of tree is the best tool for the job? Obviously heaps are canonically used to form priority queues. But the rest of them just seem to be different ways of doing the same thing. Is there any way to choose the best one for the job?

281

asked Nov 22 '09 14:11

Jason Baker

1 Answers

Let’s pick them off one by one, shall we?

Unbalanced binary trees

For search tasks, never. Basically, their performance characteristics will be completely unpredictable and the overhead of balancing a tree won’t be so big as to make unbalanced trees a viable alternative.

Apart from that, unbalanced binary trees of course have other uses, but not as search trees.

AVL trees

They are easy to develop but their performance is generally surpassed by other balancing strategies because balancing them is comparatively time-intensive. Wikipedia claims that they perform better in lookup-intensive scenarios because their height is slightly less in the worst case.

Red-black trees

These are used inside most of C++’ std::map implemenations and probably in a few other standard libraries as well. However, there’s good evidence that they are actually worse than B(+) trees in every scenario due to caching behaviour of modern CPUs. Historically, when caching wasn’t as important (or as good), they surpassed B trees when used in main memory.

2-3 trees

B-trees

B*-trees

These require the most careful consideration of all the trees, since the different constants used are basically “magical” constans which relate in weird and sometimes unpredictable way to the underlying hardware architecture. For example, the optimal number of child nodes per level can depend on the size of a memory page or cache line.

I know of no good, general rule to distinguish between them.

Tries

Completely different. Tries are also search trees, but for text retrieval of substrings in a corpus. A trie is an uncompressed prefix tree (i.e. a tree in which the paths from root to leaf nodes correspond to all the prefixes of a given string).

Tries should be compared to, and offset against, suffix trees, suffix arrays and q-gram indices – not so much against other search trees because the data that they search is different: instead of discrete words in a corpus, the latter index structures allow a factor search.

Heaps

As you’ve already said, they are not search trees at all.

184

answered Oct 19 '22 19:10

Konrad Rudolph

Related questions
                            
                                Efficient data structure for GUIDs
                            
                                Why is parameter to string.indexOf method is an int in Java
                            
                                find median in a fixed-size moving window along a long sequence of data
                            
                                Union-Find: Successor with delete
                            
                                Is a tree with all black nodes a red black tree?
                            
                                Difference between B-Trees and 2-3-4 Trees
                            
                                Deleting all nodes in a binary tree using O(1) auxiliary storage space?
                            
                                Updating a Big State Fast in Haskell
                            
                                Stacks, queues and linked lists
                            
                                Why is the order of an algorithm generally more important than the speed of the processor? [closed]
                            
                                How do I check if a dictionary has a key in it in Julia?
                            
                                Data structures for bioinformatics [closed]
                            
                                When to expose constructors of a data type when designing data structures?
                            
                                Java HashSet vs Array Performance
                            
                                Best way to resize a hash table
                            
                                Soft heaps: what is corruption and why is it useful?
                            
                                Tutorial on Graph Theory [closed]
                            
                                Java: Difference Between a collection and 'Data Structure' [closed]
                            
                                Effective Java Item 17: How can overriding removeRange() improve performance?
                            
                                How can I create a repeatable signature of a data structure?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With