Algorithmic complexity of Data.Hashtable

Tags:

I am attempting to write a function that utilizes hashes (for an implementation of A*).

After a little bit of research, I have found that the defacto standard is Data.Map.

However, when reading the API documentation, I found that: O(log n). Find the value at a key.

https://downloads.haskell.org/~ghc/6.12.2/docs/html/libraries/containers-0.3.0.0/Data-Map.html

In fact the documentation generally suggests big O times significantly inferior to the O(1) of a standard Hash.

So then I found Data.HashTable. https://hackage.haskell.org/package/base-4.2.0.2/docs/Data-HashTable.html This documentation does not mention big O directly, leading me to believe that it probably fulfills my expectations.

I have several questions: 1) Is that correct? Is O(lookupInDataHashTable) = O(1)? 2) Why would I ever want to use Data.Map given its inefficiency? 3) Is there a better library for my data structure needs?

586

asked Jan 08 '16 08:01

Abraham P

1 Answers

Data.HashTable has been deprecated and you won't find it in current base.

It was deprecated because it performed poorly in comparison to hashtables.

However, hashtables and Data.HashTable are both mutable implementations, while Data.Map and Data.HashMap are immutable.

Mutable hashmaps in Haskell are similar to the array-of-buckets or open addressing solutions in other languages. Immutable maps are based on trees or tries. In general, immutable associative containers can't be implemented with O(1) modification.

So why use immutable maps?

First, the API is much more convenient in Haskell. We can't use use mutable maps in pure functions, only in IO or ST actions.

Second, immutable maps can be safely shared between threads, which is often a crucial feature.

Third, in practice, performance difference between mutable and immutable maps can be insignificant, i. e. it doesn't noticeably impact overall program performance. O(log n) is also bounded by the available memory, so we don't get spectacular asymptotic differences compared to O(1). In particular, Data.HashMap uses a 16-branching trie, so trie depth can't realistically be more than 6 or 7.

Fourth, immutable maps can be just plain faster for reasons that I don't fully understand (more optimized libraries? better optimization from GHC?); I have tried a couple of times to replace Data.HashMap with mutable maps from hashtables, but the performance was always a bit worse afterwards.

183

answered Oct 01 '22 19:10

András Kovács

Related questions
                            
                                What does the `import Some.Module as Import` in Yesod mean?
                            
                                Heritage of the names of the monad operators
                            
                                How do I specify types for a function, where they are not used in the function's arguments?
                            
                                Evaluation, let and where in Haskell
                            
                                Equality of De Bruijn-indexed variables in a GADT
                            
                                Haskell type annotation in function
                            
                                Why does Haskell consider lambda abstractions to be in Weak Head Normal Form (WHNF)? [duplicate]
                            
                                Refactoring a Haskell function that uses the Reader monad
                            
                                Insert into Data.Set and check if element exists at the same time
                            
                                Partial application in Haskell with multiple arguments
                            
                                Type level predicate based instances?
                            
                                Install Haskell packages using cabal without internet connection
                            
                                enabling TypeFamilies makes the code not build anymore
                            
                                Haskell/Alex: Warning: Tab character
                            
                                `derivingUnbox` doesn't work for types with more than 6 Ints
                            
                                QuickCheck: How to combine two generators?
                            
                                How can I write a parser using Parsec that only accepts unique elements?
                            
                                Why can I use `>>=` without an explicit or implicit definition?
                            
                                Why is [x|x<-[1..10]] method so slow in Haskell?
                            
                                Parsec.Expr repeated Prefix with different priority

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithmic complexity of Data.Hashtable

Tags:

time-complexity

hashtable

data-structures

haskell

Abraham P

People also ask

1 Answers

András Kovács

Recent Activity

Donate For Us