I have a Parallel.ForEach loop running an intensive operation inside the body.
The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.
However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.
Here's the sample code:
Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
// If exists in myTable use it, else add to hashtable
if(myTable.ContainsKey(item.Key))
{
myObj = myTable[item.Key];
}
else
{
myObj = SomeIntensiveOperation();
myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
}
// Do something with myObj
// some code here
}
There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?
You're looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>
. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.
Edit: The result might look like this:
ConcurrentDictionary<T,K> cache = ...;
Parallel.ForEach(items, (item, loopState) =>
{
K value;
if (!cache.TryGetValue(item.Key, out value))
{
value = SomeIntensiveOperation();
cache.TryAdd(item.Key, value);
}
// Do something with value
} );
Word of warning: if the elements in items
do not all have unique item.Key
, then SomeIntensiveOperation
could get called twice for that key. In the example, the key isn't passed to SomeIntensiveOperation
, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.
check the System.Collections.Concurrent namespace i think you need ConcurrentDictionary
Use a ReaderWriterLock, this has good performance for work that has many reads and few writes that are of a short duration. Your problem seems to fit this specification.
All read operations will run quickly and lock free, the only time anyone will be blocked is when a write is happening, and that write is only as long as it takes to shove something in a Hashtable.
ReaderWriterLockSlim on MSDN
I guess I'll throw down some code...
ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim();
Hashtable myTable = new Hashtable();
Parallel.ForEach(items, (item, loopState) =>
{
cacheLock.EnterReadLock();
MyObject myObj = myTable.TryGet(item.Key);
cacheLock.ExitReadLock();
// If the object isn't cached, calculate it and cache it
if(myObj == null)
{
myObj = SomeIntensiveOperation();
cacheLock.EnterWriteLock();
try
{
myTable.Add(item.Key, myObj);
}
finally
{
cacheLock.ExitWriteLock();
}
}
// Do something with myObj
// some code here
}
static object TryGet(this Hashtable table, object key)
{
if(table.Contains(key))
return table[key]
else
return null;
}
I see no other correct choice than to use (more or less explicit) locks (A synchronized Hashtable just overrides all methods with locks).
Another option could be to allow the dictionary to go out of sync. The race condition will not corrupt the dictionary, it will just require the code to do some superfluous computations. Profile the code to check whether the lock or missing memoization has worse effects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With