From my limited knowledge of Haskell, it seems that Maps (from Data.Map) are supposed to be used much like a dictionary or hashtable in other languages, and yet are implemented as self-balancing binary search trees.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1) and requires that the elements be in Ord. Certainly there is a good reason, so what are the advantages of using a binary tree?
Also:
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Binary Search Trees are generally memory-efficient since they do not reserve more memory than they need to. On the other hand, Hash tables can be a bit more demanding if we don't know the exact number of elements we want to store.
Show activity on this post. I have read that std map is implemented using binary search tree data structure. BST is a sequential data structure (like elements in an array) which stores elements in a BST node and maintains elements in their order.
The worst-case insertion complexity for a hash table can be left at O(1)/O(log K) (with K the number of elements with the same hash) if it's acceptable to increase the worst-case lookup complexity to O(K) or O(log K) if the elements can be sorted.
Hash set and tree set both belong to the collection framework. HashSet is the implementation of the Set interface whereas Tree set implements sorted set. Tree set is backed by TreeMap while HashSet is backed by a hashmap. Sr.
Hash tables can't be implemented efficiently without mutable state, because they're based on array lookup. The key is hashed and the hash determines the index into an array of buckets. Without mutable state, inserting elements into the hashtable becomes O(n) because the entire array must be copied (alternative non-copying implementations, like DiffArray, introduce a significant performance penalty). Binary-tree implementations can share most of their structure so only a couple pointers need to be copied on inserts.
Haskell certainly can support traditional hash tables, provided that the updates are in a suitable monad. The hashtables package is probably the most widely used implementation.
One advantage of binary trees and other non-mutating structures is that they're persistent: it's possible to keep older copies of data around with no extra book-keeping. This might be useful in some sort of transaction algorithm for example. They're also automatically thread-safe (although updates won't be visible in other threads).
Traditional hashtables rely on memory mutation in their implementation. Mutable memory and referential transparency are at ends, so that relegates hashtable implementations to either the IO
or ST
monads. Trees can be implemented persistently and efficiently by leaving old leaves in memory and returning new root nodes which point to the updated trees. This lets us have pure Map
s.
The quintessential reference is Chris Okasaki's Purely Functional Data Structures.
Why is this? Using a binary tree reduces lookup time to O(log(n)) as opposed to O(1)
Lookup is only one of the operations; insertion/modification may be more important in many cases; there are also memory considerations. The main reason the tree representation was chosen is probably that it is more suited for a pure functional language. As "Real World Haskell" puts it:
Maps give us the same capabilities as hash tables do in other languages. Internally, a map is implemented as a balanced binary tree. Compared to a hash table, this is a much more efficient representation in a language with immutable data. This is the most visible example of how deeply pure functional programming affects how we write code: we choose data structures and algorithms that we can express cleanly and that perform efficiently, but our choices for specific tasks are often different their counterparts in imperative languages.
This:
and requires that the elements be in Ord.
does not seem like a big disadvantage. After all, with a hash map you need keys to be Hashable
, which seems to be more restrictive.
In what applications would a binary tree be much worse than a hashtable? What about the other way around? Are there many cases in which one would be vastly preferable to the other? Is there a traditional hashtable in Haskell?
Unfortunately, I cannot provide an extensive comparative analysis, but there is a hash map package, and you can check out its implementation details and performance figures in this blog post and decide for yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With