I'm using a dictionary to accumulate the number of occurrences of keys and, consequently, the core operation is writing a key-value pair where the value is the previous value plus one or just one if there was no previous value. However, this requires two separate dictionary operations (read and write) when I could just be doing one (AddOrUpdate
).
I notice that the concurrent dictionary supports AddOrUpdate
but the ordinary generic Dictionary
does not appear to.
Consequently, a dictionary of references to mutable ints is faster. However, this introduces unnecessary references which means heap allocations and write barriers. So I'm guessing it is possible to do significantly better but I cannot see how without rewriting Dictionary
from scratch. Am I right?
You can do something like this:
private class Counter
{
public string Key { get ; set ; }
public int Frequency { get ; set ; }
}
...
Dictionary<string,Counter> frequencyTable = new Dictionary<string,Counter>() ;
...
string someKey = GetKeyToLookup() ;
Counter item = null ;
bool hit = frequencyTable.TryGetValue( someKey,out item ) ;
if ( !hit )
{
item = new Counter{ Key=someKey,Frequency=0 } ;
}
++ item.Frequency ;
If that's not good enough, why write your own? Use the the high performance C5 Collections Library. It's free (originally funded by Microsoft, in fact), builds on Microsoft's System.Collections.Generic
interfaces and whose dictionaries, sets and bags support FindOrAdd()
semantics.
As Jim Mischel mentioned - it's impossible to do single lookup for changing dictionary's item value.
ConcurrentDictionary.AddOrUpdate
method do more than one lookup operation (reflected sources):
public TValue AddOrUpdate(TKey key, TValue addValue, Func<TKey, TValue, TValue> updateValueFactory)
{
TValue local2;
if (key == null)
{
throw new ArgumentNullException("key");
}
if (updateValueFactory == null)
{
throw new ArgumentNullException("updateValueFactory");
}
do
{
TValue local3;
while (this.TryGetValue(key, out local3))
{
TValue newValue = updateValueFactory(key, local3);
if (this.TryUpdate(key, newValue, local3))
{
return newValue;
}
}
}
while (!this.TryAddInternal(key, addValue, false, true, out local2));
return local2;
}
I've made performance test with concurrent dictionary and simple ditcionary:
AddOrUpdate extension for IDictionary:
public static class DictionaryExtensions
{
public static void AddOrUpdate<TKey, TValue>(this IDictionary<TKey, TValue> dict, TKey key, TValue initValue, Func<TKey, TValue, TValue> updateFunc)
{
TValue value;
value = dict.TryGetValue(key, out value) ? updateFunc(key, value) : initValue;
dict[key] = value;
}
}
Test:
static void Main(string[] args)
{
const int dictLength = 100000;
const int testCount = 1000000;
var cdict = new ConcurrentDictionary<string, int>(GetRandomData(dictLength));
var dict = GetRandomData(dictLength).ToDictionary(x => x.Key, x => x.Value);
var stopwatch = new Stopwatch();
stopwatch.Start();
foreach (var pair in GetRandomData(testCount))
cdict.AddOrUpdate(pair.Key, 1, (x, y) => y+1);
stopwatch.Stop();
Console.WriteLine("Concurrent dictionary: {0}", stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
foreach (var pair in GetRandomData(testCount))
dict.AddOrUpdate(pair.Key, 1, (x, y) => y+1);
stopwatch.Stop();
Console.WriteLine("Dictionary: {0}", stopwatch.ElapsedMilliseconds);
Console.ReadLine();
}
static IEnumerable<KeyValuePair<string, int>> GetRandomData(int count)
{
const int constSeed = 100;
var randGenerator = new Random(constSeed);
return Enumerable.Range(0, count).Select((x, ind) => new KeyValuePair<string, int>(randGenerator.Next().ToString() + "_" + ind, randGenerator.Next()));
}
Test results on my environment (ms):
ConcurrentDictionary: 2504
Dictionary: 1351
A dictionary update does not require multiple lookups if you're using reference types:
Say you have a Dictionary<string, Foo>
, where Foo
is a reference type and includes a Count
property:
void UpdateCount(string key)
{
Foo f;
if (dict.TryGetValue(key, out f))
{
// do the update
++f.Count;
}
else
{
dict[key] = 1;
}
}
If your values are value types ... well, then you have to deal with value type semantics. And that includes having to do two lookups.
That said, dictionary lookup is pretty dang fast. If this is causing you an issue, you must have a whole lot of occurrences to count.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With