Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compacting a WeakReference Dictionary

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.

So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

The cache is implemented as a Dictionary<TKey,WeakReference>, based on @JaredPar's Building a WeakReference Hashtable:

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by @Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.


My question is: What's the best strategy to compact a WeakReference Dictionary?

The options that I see are:

  1. Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.

  2. Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).

  3. Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?

  4. Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.

  5. Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.

Are there other strategies?

The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?

like image 810
dtb Avatar asked Jan 12 '10 08:01

dtb


2 Answers

If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).

According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.

    static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();

    void Window_Loaded(object sender, RoutedEventArgs e)
    {
        ...
        dataGrid.SelectionChanged += (_sender, _e) =>
        {
            var cs = dataGrid.SelectedItem as ClientSession;

            this.tabControl.Items.Clear();

            foreach (var tid in cs.GetThreadIDs())
            {
                tid.tabItem = new TabItem() { Header = ... };
                tid.tabItem.AddHandler(UIElement.MouseDownEvent,
                    new MouseButtonEventHandler((__sender, __e) =>
                    {
                        tabControl_SelectionChanged(tid.tabItem);
                    }), true);
                tidByTab.Add(tid.tabItem, tid);
                this.tabControl.Items.Add(tid.tabItem);
            }
        };
    }

    void tabControl_SelectionChanged(TabItem tabItem)
    {
        this.tabControl.SelectedItem = tabItem;
        if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
        {
            tidExec.EnsureBlocksLoaded();
            ShowStmt(tidExec.CurrentStmt);
        }
        else
            throw new Exception("huh?");
    }

What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).

like image 89
Gabe Halsmer Avatar answered Oct 30 '22 13:10

Gabe Halsmer


Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.

Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.

But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.

After some discussion:

Does such a[n amortized] strategy exist?

I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.

like image 20
Henk Holterman Avatar answered Oct 30 '22 12:10

Henk Holterman