Hash-consing consists in keeping in memory only one copy of a given object ; that is, if two objects are semantically equal (same content) then they should be physically equal (same location in memory). The technique is usually implemented by keeping a global hash set and creating new objects only if they are not equal to an object in the hash set.
An additional requirement is that objects in the hash table should be collectable if they are not referenced by anything except the hash table; otherwise said, the hash table should contain weak references.
The issue is furthermore complicated by the need to have constant time, thus shallow, hashing and equality tests ; thus objects have a unique identifier that is incremented when a new object is added to the table.
I have a working implementation that uses System.Collections.Generic.Dictionary<key, node>
where key
is a tuple giving a shallow summary of the node (suitable for default hashing and equality test) and node
is the object. The only problem is that the Dictionary
keeps strong references to the nodes !
I could use a Dictionary
to WeakReference
's but this would not free the keys pointing to dangling references.
Some advocate using System.Runtime.CompilerServices.ConditionalWeakTable
but this class seems to do the opposite : it frees the value when the key is collected, whereas I need to free the key when the value is collected.
One could try using System.Runtime.CompilerServices.ConditionalWeakTable<node, node>
but I would need custom hashing and equality tests... and ConditionalWeakTable
is documented not to use the GetHashCode()
virtual method, instead using the default hashing function.
Thus my question : is there some equivalent of Dictionary
that would keep weak references to values and free the keys when the references become dangling ?
You are right that CWT does not solve the hash-consing problem because it begs the question - its keys assume reference equality. However, it might be worth pointing out that CWT does not hold on to keys or values. Here is a little test:
open System.Collections.Generic
open System.Runtime.CompilerServices
let big () =
ref (Array.zeroCreate (1024 * 1024) : byte [])
let test1 () =
let d = Dictionary(HashIdentity.Reference)
for i in 1 .. 10000 do
stdout.WriteLine(i)
let big = big ()
d.Add(big, big)
d
let test2 () =
let d = ConditionalWeakTable()
for i in 1 .. 10000 do
stdout.WriteLine(i)
let big = big ()
d.Add(big, big)
d
On my machine, test1
runs out of memory and test2
succeeds. It seems that this would only happen if CWT did not hold on to keys as well as values.
For hash-consing, your best bet might be what Artem is suggesting in the comments. If this sounds too complicated, it also makes a lot of sense to just give the user control, say:
let f = MyFactory() // a dictionary with weak reference values hidden inside
f.Create(..) : MyObject // MyObject has no constructors of its own
f.Cleanup() // explicitly cleans up entries for collected keys
Then you do not need to introduce threading, study how GC internals work, or do any magic. The user of the library may decide where it is appropriate to clean up or simply "forget" the factory object - which would collect the whole table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With