Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In practice, whats the better choice: hash table, radix tree, red black tree or ...? [closed]

Tags:

algorithm

Did you make some experiences with these structures mentioned above? If the insertion and lookup matters, then what seems to be best in practice?

If the hash table has lots of collisions, you'd end up with a bucket list you need to traverse, so basically you could end up in O(n) if performance matters. I've made no experiences with the radix tree so far and I've head that the red black tree beats the AVL tree regarding lookups.

What is your experience with that?

thanks, Dan

like image 407
algstranger Avatar asked Dec 03 '22 11:12

algstranger


2 Answers

I don't think I've ever had a situation where I've written code with a hashtable, and the structure has proved to have inherently unacceptable performance.

I don't think I've ever had a situation where I've written code with a balanced tree, and the structure has proved to have inherently unacceptable performance.

So I don't worry about it too much, and take the easy route. Python provides hash-based sets/dictionaries, so I use those. C++03 provides set and map but not unordered_set and unordered_map, so I use those. In languages/libraries where both are available, I use a tree if I need order, and a hashtable if (a) I don't need order, and (b) I have or can write a decent hash function. If I have a natural order already, but not a natural hash function, then of course the easy route is a tree.

In practice, the vast majority of associative arrays are keyed either on strings, integers, or tuples of those. So you always have an order, and a decent hash function isn't far off. If the one I'm using seems at all dodgy, it's therefore easy enough to try them both.

Radix trees are basically for cases where you expect long common prefixes. For example, if your keys are URLs of web pages, then somewhere between "almost all" and "all" of them are going to start with either http:// or https://, with hostnames providing further common prefixes for sections of your tree. An obvious optimisation, therefore, is to avoid comparing that common prefix multiple times during a lookup, which indicates against a "plain" balanced tree. But as always, the "obvious" optimisation isn't necessarily worth the effort, unless you happen to have a library for it already to hand.

like image 88
Steve Jessop Avatar answered Jan 01 '23 20:01

Steve Jessop


I think you are missing the point with this question. There is no one-size-fits-all data structure that is best for all possible scenarios. Each structure has its own strengths and weaknesses.

For example, as you already know, hash tables have an expected O(1) lookup and insert/remove performance. The downside is that this does not take into account the hidden constant of having to actually computing the hash of the values which can be quite expensive (e.g. when using large strings). Collisions should be exceedingly rare if you use a good hash function, but even small amounts of collisions do add up and affect performance.

On the other hand, Balanced Binary Search trees (Red-black and AVL) support O(log n) operations, which is indeed slower, but have the added advantage of keeping the elements sorted - thus enabling a lot more operations (e.g. min, max, k-th element, upper-bound, lower-bound, range queries etc.) that are simply not possible in an unordered structure such as a hash table. In practice, the difference in speed between the two is often negligible, even for operations involving millions of elements. The speed difference between Red-black and AVL-trees is even less noticeable, both being O(log n) and only differing by a small constant factor.

Then there are lots of other data structures like segment trees, Binary Indexed Trees (AKA Fenwick trees), and Radix Trees which you mentioned. Please read a good book on data structures (CLRS is recommended) or at least read the wikipedia article on each, understand how they work and the cost of operations on them, and decide which one suits your job.

The bottom line is what data structure you chose depends on what you need it for. There is no one structure that is the absolute best and would satisfy any scenario you could possibly come up with.

like image 41
MAK Avatar answered Jan 01 '23 21:01

MAK