When using immutable dictionaries in F# , how much overhead is there when adding / removing entries?
Will it treat entire buckets as immutable and clone those and only recreate the bucket whos item has changed?
Even if that is the case, it seems like there is alot of copying that needs to be done in order to create the new dictionary(?)
No, it can't. The object can be made mutable (or not) irrespective of what its __hash__ method does. The relationship between immutable objects and __hash__ is that, since an immutable object cannot be changed, the value returned by __hash__ remains constant post-construction.
An ImmutableDictionary has methods to modify it like Add or Remove , but they will create a new dictionary and return that, the original one remains unchanged and the copy of the new immutable dictionary is returned.
These two classes ConcurrentDictionary and ImmutableDictionary were compared just because of the simple reason, both are thread safe. However, it is not a good idea to use ImmutableDictionary for multi-threading. It is designed to represent data which should be loaded once, and shouldn't be changed / modified later on.
I looked at the implementation of the F# Map<K,V>
type and I think it is implemented as a functional AVL tree. It stores the values in the inner nodes of the tree as well as in the leafs and for each node, it makes sure that |height(left) - height(right)| <= 1.
A
/ \
B C
/ \
D E
I think that the both average and worst-case complexities are O(log(n))
:
Insert we need to clone all nodes on the path from the root to the newly inserted element and the height of the tree is at most O(log(n))
. On the "way back", the tree may need to rebalance each node, but that's also only O(log(n))
Remove is similar - we find the element and then clone all nodes from the root to that element (rebalancing nodes on the way back to the root)
Note that other data-structures that don't need to rebalance all nodes from the root to the current one on insertion/deletion won't be really useful in the immutable scenario, because you need to create new nodes for the entire path anyway.
A lot of the tree structure can be reused. I don't know the algorithmic complexity offhand, I would guess on average there's only like amortized logN 'waste'...
Why not try to write a program to measure? (We'll see if I can get motivated tonight to try it myself.)
EDIT
Ok, here is something I hacked. I haven't decided if there's any useful data here or not.
open System
let rng = new Random()
let shuffle (array : _[]) =
let n = array.Length
for x in 1..n do
let i = n-x
let j = rng.Next(i+1)
let tmp = array.[i]
array.[i] <- array.[j]
array.[j] <- tmp
let TryTwoToThe k =
let N = pown 2 k
GC.Collect()
let a = Array.init N id
let makeRandomTreeAndDiscard() =
shuffle a
let mutable m = Map.empty
for i in 0..N-1 do
m <- m.Add(i,i)
for i in 1..20 do
makeRandomTreeAndDiscard()
for i in 1..20 do
makeRandomTreeAndDiscard()
for i in 1..20 do
makeRandomTreeAndDiscard()
#time
// run these as separate interactions
printfn "16"
TryTwoToThe 16
printfn "17"
TryTwoToThe 17
printfn "18"
TryTwoToThe 18
When I run this in FSI on my box, I get
--> Timing now on
>
16
Real: 00:00:08.079, CPU: 00:00:08.062, GC gen0: 677, gen1: 30, gen2: 1
>
17
Real: 00:00:17.144, CPU: 00:00:17.218, GC gen0: 1482, gen1: 47, gen2: 4
>
18
Real: 00:00:37.790, CPU: 00:00:38.421, GC gen0: 3400, gen1: 1059, gen2: 17
which suggests the memory may be scaling super-linearly but not too badly. I am presuming that the gen0 collections are roughly a good proxy for the 'waste' of rebalancing the tree. But it is late so I am not sure if I have thought this through well enough. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With