Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Immutable Dictionary overhead?

When using immutable dictionaries in F# , how much overhead is there when adding / removing entries?

Will it treat entire buckets as immutable and clone those and only recreate the bucket whos item has changed?

Even if that is the case, it seems like there is alot of copying that needs to be done in order to create the new dictionary(?)

like image 775
Roger Johansson Avatar asked May 08 '10 05:05

Roger Johansson


People also ask

Can we create immutable dictionary in Python?

No, it can't. The object can be made mutable (or not) irrespective of what its __hash__ method does. The relationship between immutable objects and __hash__ is that, since an immutable object cannot be changed, the value returned by __hash__ remains constant post-construction.

What is ImmutableDictionary?

An ImmutableDictionary has methods to modify it like Add or Remove , but they will create a new dictionary and return that, the original one remains unchanged and the copy of the new immutable dictionary is returned.

Is ImmutableDictionary thread safe?

These two classes ConcurrentDictionary and ImmutableDictionary were compared just because of the simple reason, both are thread safe. However, it is not a good idea to use ImmutableDictionary for multi-threading. It is designed to represent data which should be loaded once, and shouldn't be changed / modified later on.


2 Answers

I looked at the implementation of the F# Map<K,V> type and I think it is implemented as a functional AVL tree. It stores the values in the inner nodes of the tree as well as in the leafs and for each node, it makes sure that |height(left) - height(right)| <= 1.

            A 
         /     \
        B       C
      /   \
     D     E

I think that the both average and worst-case complexities are O(log(n)):

  • Insert we need to clone all nodes on the path from the root to the newly inserted element and the height of the tree is at most O(log(n)). On the "way back", the tree may need to rebalance each node, but that's also only O(log(n))

  • Remove is similar - we find the element and then clone all nodes from the root to that element (rebalancing nodes on the way back to the root)

Note that other data-structures that don't need to rebalance all nodes from the root to the current one on insertion/deletion won't be really useful in the immutable scenario, because you need to create new nodes for the entire path anyway.

like image 68
Tomas Petricek Avatar answered Sep 28 '22 00:09

Tomas Petricek


A lot of the tree structure can be reused. I don't know the algorithmic complexity offhand, I would guess on average there's only like amortized logN 'waste'...

Why not try to write a program to measure? (We'll see if I can get motivated tonight to try it myself.)

EDIT

Ok, here is something I hacked. I haven't decided if there's any useful data here or not.

open System

let rng = new Random()
let shuffle (array : _[]) =
    let n = array.Length
    for x in 1..n do
        let i = n-x
        let j = rng.Next(i+1)
        let tmp = array.[i]
        array.[i] <- array.[j]
        array.[j] <- tmp

let TryTwoToThe k =
    let N = pown 2 k

    GC.Collect()

    let a = Array.init N id

    let makeRandomTreeAndDiscard() =
        shuffle a
        let mutable m = Map.empty
        for i in 0..N-1 do
            m <- m.Add(i,i)

    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()
    for i in 1..20 do
        makeRandomTreeAndDiscard()

#time
// run these as separate interactions
printfn "16"
TryTwoToThe 16

printfn "17"
TryTwoToThe 17

printfn "18"
TryTwoToThe 18

When I run this in FSI on my box, I get

--> Timing now on

> 
16
Real: 00:00:08.079, CPU: 00:00:08.062, GC gen0: 677, gen1: 30, gen2: 1
> 
17
Real: 00:00:17.144, CPU: 00:00:17.218, GC gen0: 1482, gen1: 47, gen2: 4
> 
18
Real: 00:00:37.790, CPU: 00:00:38.421, GC gen0: 3400, gen1: 1059, gen2: 17

which suggests the memory may be scaling super-linearly but not too badly. I am presuming that the gen0 collections are roughly a good proxy for the 'waste' of rebalancing the tree. But it is late so I am not sure if I have thought this through well enough. :)

like image 40
Brian Avatar answered Sep 28 '22 02:09

Brian