Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash-consing in F# and weak hash tables in .net

Hash-consing consists in keeping in memory only one copy of a given object ; that is, if two objects are semantically equal (same content) then they should be physically equal (same location in memory). The technique is usually implemented by keeping a global hash set and creating new objects only if they are not equal to an object in the hash set.

An additional requirement is that objects in the hash table should be collectable if they are not referenced by anything except the hash table; otherwise said, the hash table should contain weak references.

The issue is furthermore complicated by the need to have constant time, thus shallow, hashing and equality tests ; thus objects have a unique identifier that is incremented when a new object is added to the table.

I have a working implementation that uses System.Collections.Generic.Dictionary<key, node> where key is a tuple giving a shallow summary of the node (suitable for default hashing and equality test) and node is the object. The only problem is that the Dictionary keeps strong references to the nodes !

I could use a Dictionary to WeakReference's but this would not free the keys pointing to dangling references.

Some advocate using System.Runtime.CompilerServices.ConditionalWeakTable but this class seems to do the opposite : it frees the value when the key is collected, whereas I need to free the key when the value is collected.

One could try using System.Runtime.CompilerServices.ConditionalWeakTable<node, node> but I would need custom hashing and equality tests... and ConditionalWeakTable is documented not to use the GetHashCode() virtual method, instead using the default hashing function.

Thus my question : is there some equivalent of Dictionary that would keep weak references to values and free the keys when the references become dangling ?

like image 523
David Monniaux Avatar asked Mar 25 '13 14:03

David Monniaux


1 Answers

You are right that CWT does not solve the hash-consing problem because it begs the question - its keys assume reference equality. However, it might be worth pointing out that CWT does not hold on to keys or values. Here is a little test:

open System.Collections.Generic
open System.Runtime.CompilerServices

let big () =
    ref (Array.zeroCreate (1024 * 1024) : byte [])

let test1 () =
    let d = Dictionary(HashIdentity.Reference)
    for i in 1 .. 10000 do
        stdout.WriteLine(i)
        let big = big ()
        d.Add(big, big)
    d

let test2 () =
    let d = ConditionalWeakTable()
    for i in 1 .. 10000 do
        stdout.WriteLine(i)
        let big = big ()
        d.Add(big, big)
    d

On my machine, test1 runs out of memory and test2 succeeds. It seems that this would only happen if CWT did not hold on to keys as well as values.

For hash-consing, your best bet might be what Artem is suggesting in the comments. If this sounds too complicated, it also makes a lot of sense to just give the user control, say:

let f = MyFactory() // a dictionary with weak reference values hidden inside
f.Create(..) : MyObject // MyObject has no constructors of its own
f.Cleanup() // explicitly cleans up entries for collected keys 

Then you do not need to introduce threading, study how GC internals work, or do any magic. The user of the library may decide where it is appropriate to clean up or simply "forget" the factory object - which would collect the whole table.

like image 77
t0yv0 Avatar answered Nov 01 '22 01:11

t0yv0