Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is the whole Map copied when a new binding is inserted?

Tags:

haskell

I would like to better understand the interns of e.g. Data.Map. When I insert a new binding in a Map, then, because of immutability of data I get back a new data structure that is identical with the old data structure plus the new binding.

I would like to understand how this is achieved. Does the compiler eventually implement this by copying the whole data structure with e.g. millions of bindings? Can it generally be said that mutable data structures/arrays (e.g. Data.Judy) or imperative programming languages perform better in such cases? Does immutable data have any advantage when it comes to dictionaries/key-value stores?

like image 348
J Fritsch Avatar asked Apr 03 '12 23:04

J Fritsch


3 Answers

Map is built on a tree data structure. Basically, a new Map value is constructed, but it'll be filled almost entirely with pointers to the old structure. Since values never change in Haskell, this is a safe, and very important optimisation, known as sharing.

This means that you can have many similar versions of the same data structure hanging around, but only the branches of the tree that differ will be stored anew; the rest will simply be pointers to the original copy of the branch. And, of course, if you throw away the old Map, the branches you did change will be reclaimed by the garbage collector.

Sharing is key to the performance of immutable data structures. You might find this Wikipedia article helpful; it has some enlightening graphs showing how modified data gets represented with sharing.

like image 200
ehird Avatar answered Oct 26 '22 00:10

ehird


No. The documentation for Data.Map.insert states that insertion takes O(log n) time. It would be impossible to satisfy that bound if it had to copy the entire structure.

like image 37
hammar Avatar answered Oct 25 '22 23:10

hammar


Data.Map doesn't copy the old map; it (lazily) allocates O(log N) new nodes, which point to (and thereby share) most of the old map.

Because "updating" the map doesn't disrupt old versions, this kind of datastructure gives you greater freedom in building concurrent algorithms.

like image 40
comingstorm Avatar answered Oct 26 '22 01:10

comingstorm