Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithmic complexity of Data.Hashtable

I am attempting to write a function that utilizes hashes (for an implementation of A*).

After a little bit of research, I have found that the defacto standard is Data.Map.

However, when reading the API documentation, I found that: O(log n). Find the value at a key.

https://downloads.haskell.org/~ghc/6.12.2/docs/html/libraries/containers-0.3.0.0/Data-Map.html

In fact the documentation generally suggests big O times significantly inferior to the O(1) of a standard Hash.

So then I found Data.HashTable. https://hackage.haskell.org/package/base-4.2.0.2/docs/Data-HashTable.html This documentation does not mention big O directly, leading me to believe that it probably fulfills my expectations.

I have several questions: 1) Is that correct? Is O(lookupInDataHashTable) = O(1)? 2) Why would I ever want to use Data.Map given its inefficiency? 3) Is there a better library for my data structure needs?

like image 586
Abraham P Avatar asked Jan 08 '16 08:01

Abraham P


People also ask

What is the complexity of finding an element in a hash table?

The hash key is calculated in O(1) time complexity as always, and the required location is accessed in O(1). Insertion: In the best case, the key indicates a vacant location and the element is directly inserted into the hash table. So, overall complexity is O(1).

Is Hashtable faster than array?

Hash tables tend to be faster when it comes to searching for items. In arrays, you have to loop over all items before you find what you are looking for while in a hash table you go directly to the location of the item. Inserting an item is also faster in Hash tables since you just hash the key and insert it.

What is the complexity of search of in a hash table in the best case?

a perfect has is O(1) lookup, but for that you have to know what the data will be when you design the table. O(n) is worst case, O(1) is average case.

Why Hashmap can perform O 1?

Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using.


1 Answers

Data.HashTable has been deprecated and you won't find it in current base.

It was deprecated because it performed poorly in comparison to hashtables.

However, hashtables and Data.HashTable are both mutable implementations, while Data.Map and Data.HashMap are immutable.

Mutable hashmaps in Haskell are similar to the array-of-buckets or open addressing solutions in other languages. Immutable maps are based on trees or tries. In general, immutable associative containers can't be implemented with O(1) modification.

So why use immutable maps?

First, the API is much more convenient in Haskell. We can't use use mutable maps in pure functions, only in IO or ST actions.

Second, immutable maps can be safely shared between threads, which is often a crucial feature.

Third, in practice, performance difference between mutable and immutable maps can be insignificant, i. e. it doesn't noticeably impact overall program performance. O(log n) is also bounded by the available memory, so we don't get spectacular asymptotic differences compared to O(1). In particular, Data.HashMap uses a 16-branching trie, so trie depth can't realistically be more than 6 or 7.

Fourth, immutable maps can be just plain faster for reasons that I don't fully understand (more optimized libraries? better optimization from GHC?); I have tried a couple of times to replace Data.HashMap with mutable maps from hashtables, but the performance was always a bit worse afterwards.

like image 183
András Kovács Avatar answered Oct 01 '22 19:10

András Kovács