Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently update bindings in a .NET dictionary

I'm using a dictionary to accumulate the number of occurrences of keys and, consequently, the core operation is writing a key-value pair where the value is the previous value plus one or just one if there was no previous value. However, this requires two separate dictionary operations (read and write) when I could just be doing one (AddOrUpdate).

I notice that the concurrent dictionary supports AddOrUpdate but the ordinary generic Dictionary does not appear to.

Consequently, a dictionary of references to mutable ints is faster. However, this introduces unnecessary references which means heap allocations and write barriers. So I'm guessing it is possible to do significantly better but I cannot see how without rewriting Dictionary from scratch. Am I right?

like image 645
J D Avatar asked Aug 27 '13 16:08

J D


3 Answers

You can do something like this:

private class Counter
{
  public string Key       { get ; set ; }
  public int    Frequency { get ; set ; }
}

...

Dictionary<string,Counter> frequencyTable = new Dictionary<string,Counter>() ;

...

string someKey = GetKeyToLookup() ;
Counter item = null ;
bool hit = frequencyTable.TryGetValue( someKey,out item ) ;
if ( !hit )
{
  item = new Counter{ Key=someKey,Frequency=0 } ;
}
++ item.Frequency ;

If that's not good enough, why write your own? Use the the high performance C5 Collections Library. It's free (originally funded by Microsoft, in fact), builds on Microsoft's System.Collections.Generic interfaces and whose dictionaries, sets and bags support FindOrAdd() semantics.

  • Nuget: http://www.nuget.org/packages/C5/
  • Project Home Page: http://www.itu.dk/research/c5/
  • The documentation is ITU-TR-2006-76 — The C5 Generic Collection Library for C# and CLI: Version 1.1.0 of 2008-02-10. It's a bit out of date, since it reflects v1.1.1 rather than the current version's (v2.2 as of 27 August 2013). The basics haven't changed, though.
like image 70
Nicholas Carey Avatar answered Oct 21 '22 03:10

Nicholas Carey


As Jim Mischel mentioned - it's impossible to do single lookup for changing dictionary's item value. ConcurrentDictionary.AddOrUpdate method do more than one lookup operation (reflected sources):

public TValue AddOrUpdate(TKey key, TValue addValue, Func<TKey, TValue, TValue> updateValueFactory)
{
    TValue local2;
    if (key == null)
    {
        throw new ArgumentNullException("key");
    }
    if (updateValueFactory == null)
    {
        throw new ArgumentNullException("updateValueFactory");
    }
    do
    {
        TValue local3;
        while (this.TryGetValue(key, out local3))
        {
            TValue newValue = updateValueFactory(key, local3);
            if (this.TryUpdate(key, newValue, local3))
            {
                return newValue;
            }
        }
    }
    while (!this.TryAddInternal(key, addValue, false, true, out local2));
    return local2;
}

I've made performance test with concurrent dictionary and simple ditcionary:

AddOrUpdate extension for IDictionary:

public static class DictionaryExtensions
{
    public static void AddOrUpdate<TKey, TValue>(this IDictionary<TKey, TValue> dict, TKey key, TValue initValue, Func<TKey, TValue, TValue> updateFunc)
    {
        TValue value;
        value = dict.TryGetValue(key, out value) ? updateFunc(key, value) : initValue;

        dict[key] = value;
    }
}

Test:

static void Main(string[] args)
{
    const int dictLength = 100000;
    const int testCount = 1000000;

    var cdict = new ConcurrentDictionary<string, int>(GetRandomData(dictLength));
    var dict = GetRandomData(dictLength).ToDictionary(x => x.Key, x => x.Value);

    var stopwatch = new Stopwatch();
    stopwatch.Start();
    foreach (var pair in GetRandomData(testCount))
        cdict.AddOrUpdate(pair.Key, 1, (x, y) => y+1);          

    stopwatch.Stop();
    Console.WriteLine("Concurrent dictionary: {0}", stopwatch.ElapsedMilliseconds);

    stopwatch.Reset();
    stopwatch.Start();

    foreach (var pair in GetRandomData(testCount))
        dict.AddOrUpdate(pair.Key, 1, (x, y) => y+1);   

    stopwatch.Stop();
    Console.WriteLine("Dictionary: {0}", stopwatch.ElapsedMilliseconds);
    Console.ReadLine();
}

static IEnumerable<KeyValuePair<string, int>> GetRandomData(int count)
{
    const int constSeed = 100;
    var randGenerator = new Random(constSeed);
    return Enumerable.Range(0, count).Select((x, ind) => new KeyValuePair<string, int>(randGenerator.Next().ToString() + "_" + ind, randGenerator.Next()));
}

Test results on my environment (ms):

ConcurrentDictionary: 2504
Dictionary: 1351
like image 20
Ivan Bianko Avatar answered Oct 21 '22 04:10

Ivan Bianko


A dictionary update does not require multiple lookups if you're using reference types:

Say you have a Dictionary<string, Foo>, where Foo is a reference type and includes a Count property:

void UpdateCount(string key)
{
    Foo f;
    if (dict.TryGetValue(key, out f))
    {
        // do the update
        ++f.Count;
    }
    else
    {
        dict[key] = 1;
    }
}

If your values are value types ... well, then you have to deal with value type semantics. And that includes having to do two lookups.

That said, dictionary lookup is pretty dang fast. If this is causing you an issue, you must have a whole lot of occurrences to count.

like image 41
Jim Mischel Avatar answered Oct 21 '22 04:10

Jim Mischel