Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Yet another ordered map vs. unordered (hash) map question

This is a very naive question, yet I can't find an explicit discussion of it. Everybody agrees that using a hash for a map container with only 10 elements, is overkill.. an ordered map will be much faster. With a hundred; a thousand, etc. a map should scale by logN where N= # of pairs in the map. So for a thousand, it takes three times as long; a million, six times as long; 10 billion, nine times as long.

Of course, we are led to believe that a well designed hashed container can be searched in O(1) (constant) time vs O(logN) for a sorted container. But what are the implied constants? At what point does the hash map lose the map in the dust? Especially, if the keys are integers, there is little overhead in the key search, so the constant in map would be small.

Nevertheless, just about EVERYBODY thinks hashed containers are faster. Lots of real time tests have been done.

What's going on?

like image 268
Mike Razar Avatar asked Apr 22 '26 21:04

Mike Razar


1 Answers

As you've said, a hash based map does have the potential to be asymptotically faster than a binary search tree, with queries in O(1) vs O(log(N)) time - but this is entirely dependent on the performance of the hash function used over the allowable distribution of input data.

There are two important worst-case situations to think about with a hash table:

  1. All data items generate the same hash index, therefore all items end up in the same hash bucket - querying the hash map will take O(N) in this case.
  2. The distribution of hash indices generated by the data is extremely sparse, therefore most hash buckets are empty. You can still end up with O(1) query time in this case but the space complexity can essentially become unbounded in the limit.

A binary search tree on the other hand (at least the red-black tree used in most standard library implementations) enjoys worst-case O(log(N)) time and O(N) space complexity.

The up-shot of all of this (in my opinion) is that if you know enough about your input data to design a "good" hash function (doesn't have too many collisions, doesn't generate too sparse a distribution of hash buckets) using the hash map will generally be a better choice.

If you can't ensure the performance of your hash function over your expected inputs use a BST.

The exact point at which one becomes better than the other is entirely problem/machine dependent.

Hope this helps.

like image 120
Darren Engwirda Avatar answered Apr 25 '26 14:04

Darren Engwirda



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!