I know how normal mutable maps work (using hashtables), and I know how immutable lists work (recursive linked lists) and their advantage over mutable lists (constant time appending without messing up the original) but how do immutable maps (e.g. Scala's) work?
I know the advantage of not messing with the original map when generating new maps, but how does the underlying data structure work, and what kind of performance characteristics do they have, for example compared to mutable hash tables? Is there any standard data structure which people use to implement these, that I could go look up in CLRS/wikipedia?
Immutable data structure are data structures, like lists, arrays, sets etc., which cannot be changed, meaning that the values inside them can't be added, removed, moved or swapped.
The map data type is referred to as an associative array because, like an array, it is a collection of values rather than a single value, as an Int or a String is. Furthermore, each distinct key is associated with a value, resulting in an associative array.
Immutable data structures provides referential transparency which makes it easier to reason about our program locally. Another way to think about it is that every time we execute a pure (referentially transparent) function with the same input, we get the same output.
Persistent Hash maps are implemented using a structure called a Hash trie. It was originally proposed by Phil Bagwell (who is a member of the Scala group at EPFL) but actually implemented by Clojure first. It hit scala when 2.8 came out in 2010.
There is a great talk on functional data structures by Dan Spiewak where the mechanics of the hash trie are explained extremely lucidly (along with other things such as banker's queues)! He also explains asymptotic big-O performance very well in the talk.
Last October saw Phil give another talk at the London scala Lift Off, this time on parallel persistent data structures.
Persistent sorted maps are implemented via a Red-Black tree
It could be a tree (red-black) or a hash map. Their access characteristics depend on the underlying implementation. A tree is O(log n) for read access; a hash map is O(1).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With