I am attempting to write a function that utilizes hashes (for an implementation of A*).
After a little bit of research, I have found that the defacto standard is Data.Map
.
However, when reading the API documentation, I found that: O(log n). Find the value at a key.
https://downloads.haskell.org/~ghc/6.12.2/docs/html/libraries/containers-0.3.0.0/Data-Map.html
In fact the documentation generally suggests big O times significantly inferior to the O(1) of a standard Hash.
So then I found Data.HashTable
.
https://hackage.haskell.org/package/base-4.2.0.2/docs/Data-HashTable.html
This documentation does not mention big O directly, leading me to believe that it probably fulfills my expectations.
I have several questions:
1) Is that correct? Is O(lookupInDataHashTable) = O(1)?
2) Why would I ever want to use Data.Map
given its inefficiency?
3) Is there a better library for my data structure needs?
The hash key is calculated in O(1) time complexity as always, and the required location is accessed in O(1). Insertion: In the best case, the key indicates a vacant location and the element is directly inserted into the hash table. So, overall complexity is O(1).
Hash tables tend to be faster when it comes to searching for items. In arrays, you have to loop over all items before you find what you are looking for while in a hash table you go directly to the location of the item. Inserting an item is also faster in Hash tables since you just hash the key and insert it.
a perfect has is O(1) lookup, but for that you have to know what the data will be when you design the table. O(n) is worst case, O(1) is average case.
Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using.
Data.HashTable
has been deprecated and you won't find it in current base
.
It was deprecated because it performed poorly in comparison to hashtables
.
However, hashtables
and Data.HashTable
are both mutable implementations, while Data.Map
and Data.HashMap
are immutable.
Mutable hashmaps in Haskell are similar to the array-of-buckets or open addressing solutions in other languages. Immutable maps are based on trees or tries. In general, immutable associative containers can't be implemented with O(1) modification.
So why use immutable maps?
First, the API is much more convenient in Haskell. We can't use use mutable maps in pure functions, only in IO or ST actions.
Second, immutable maps can be safely shared between threads, which is often a crucial feature.
Third, in practice, performance difference between mutable and immutable maps can be insignificant, i. e. it doesn't noticeably impact overall program performance. O(log n) is also bounded by the available memory, so we don't get spectacular asymptotic differences compared to O(1). In particular, Data.HashMap
uses a 16-branching trie, so trie depth can't realistically be more than 6 or 7.
Fourth, immutable maps can be just plain faster for reasons that I don't fully understand (more optimized libraries? better optimization from GHC?); I have tried a couple of times to replace Data.HashMap
with mutable maps from hashtables
, but the performance was always a bit worse afterwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With