Why are Haskell Maps implemented as balanced binary trees instead of traditional hashtables?

Tags:

From my limited knowledge of Haskell, it seems that Maps (from Data.Map) are supposed to be used much like a dictionary or hashtable in other languages, and yet are implemented as self-balancing binary search trees.

Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1) and requires that the elements be in Ord. Certainly there is a good reason, so what are the advantages of using a binary tree?

Also:

In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?

512

asked Sep 20 '13 04:09

reem

3 Answers

Hash tables can't be implemented efficiently without mutable state, because they're based on array lookup. The key is hashed and the hash determines the index into an array of buckets. Without mutable state, inserting elements into the hashtable becomes O(n) because the entire array must be copied (alternative non-copying implementations, like DiffArray, introduce a significant performance penalty). Binary-tree implementations can share most of their structure so only a couple pointers need to be copied on inserts.

Haskell certainly can support traditional hash tables, provided that the updates are in a suitable monad. The hashtables package is probably the most widely used implementation.

One advantage of binary trees and other non-mutating structures is that they're persistent: it's possible to keep older copies of data around with no extra book-keeping. This might be useful in some sort of transaction algorithm for example. They're also automatically thread-safe (although updates won't be visible in other threads).

113

answered Sep 17 '22 17:09

John L

Traditional hashtables rely on memory mutation in their implementation. Mutable memory and referential transparency are at ends, so that relegates hashtable implementations to either the IO or ST monads. Trees can be implemented persistently and efficiently by leaving old leaves in memory and returning new root nodes which point to the updated trees. This lets us have pure Maps.

The quintessential reference is Chris Okasaki's Purely Functional Data Structures.

answered Sep 20 '22 17:09

J. Abrahamson

Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1)

Lookup is only one of the operations; insertion/modification may be more important in many cases; there are also memory considerations. The main reason the tree representation was chosen is probably that it is more suited for a pure functional language. As "Real World Haskell" puts it:

Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different their counterparts in imperative languages.

This:

and requires that the elements be in Ord.

does not seem like a big disadvantage. After all, with a hash map you need keys to be Hashable, which seems to be more restrictive.

In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?

Unfortunately, I cannot provide an extensive comparative analysis, but there is a hash map package, and you can check out its implementation details and performance figures in this blog post and decide for yourself.

answered Sep 21 '22 17:09

fjarri

Related questions
                            
                                Difference between source code and "inspect element"
                            
                                Set GPS Location in Emulator using Android Studio
                            
                                Spring Boot - Environment @Autowired throws NullPointerException
                            
                                Multiple SLF4J bindings cause error?
                            
                                Why IEnumerable slow and List is fast?
                            
                                Android ClickableSpan get text onClick()
                            
                                JavaScript Object (JSON) to URL String Format
                            
                                How to connect to MongoDB EC2 instance
                            
                                Proper way to use multiprocessor.Pool in a nested loop
                            
                                Example usage for ContentLoadingProgressBar
                            
                                Dart: convert map into query string
                            
                                Blade view: if statement with OR/AND condition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With