Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a hashtable inside a Parallel.ForEach?

I have a Parallel.ForEach loop running an intensive operation inside the body.

The operation can use a Hashtable to store the values, and can be reused for other consecutive loop items. I add to the Hashtable after the intensive operation is complete, the next loop item can look up in the Hashtable and reuse the object, instead of running the intensive operation again.

However, because I am using Parallel.ForEach there is a unsafe issue, causing the Hashtable.Add and the ContainsKey(key) calls go out of sync, as they might be running in parallel. Introducing locks may cause perf issues.

Here's the sample code:

Hashtable myTable = new Hashtable;
Parallel.ForEach(items, (item, loopState) =>
{
    // If exists in myTable use it, else add to hashtable
    if(myTable.ContainsKey(item.Key))
    {
       myObj = myTable[item.Key];
    }
    else
    {
       myObj = SomeIntensiveOperation();
       myTable.Add(item.Key, myObj); // Issue is here : breaks with exc during runtime
    }
    // Do something with myObj
    // some code here
}

There must be some API, Property setting inside TPL library, that could handle this scenario. Is there?

like image 627
Vin Avatar asked Nov 01 '09 18:11

Vin


4 Answers

You're looking for System.Collections.Concurrent.ConcurrentDictionary<TKey, TValue>. The new concurrent collections use significantly improved locking mechanisms and should perform excellectly in parallel algorithms.

Edit: The result might look like this:

ConcurrentDictionary<T,K> cache = ...;
Parallel.ForEach(items, (item, loopState) =>
{
    K value;
    if (!cache.TryGetValue(item.Key, out value))
    {
        value = SomeIntensiveOperation();
        cache.TryAdd(item.Key, value);
    }

    // Do something with value
} );

Word of warning: if the elements in items do not all have unique item.Key, then SomeIntensiveOperation could get called twice for that key. In the example, the key isn't passed to SomeIntensiveOperation, but it means that the "Do something with value" code could execute key/valueA and key/valueB pairs, and only one result would get stored in the cache (not necessarily the first one computed by SomeIntensiveOperation either). You'd need a parallel lazy factory to handle this if it's a problem. Also, for obvious reasons SomeIntensiveOperation should be thread safe.

like image 82
Sam Harwell Avatar answered Oct 15 '22 05:10

Sam Harwell


check the System.Collections.Concurrent namespace i think you need ConcurrentDictionary

like image 44
Hannoun Yassir Avatar answered Oct 15 '22 05:10

Hannoun Yassir


Use a ReaderWriterLock, this has good performance for work that has many reads and few writes that are of a short duration. Your problem seems to fit this specification.

All read operations will run quickly and lock free, the only time anyone will be blocked is when a write is happening, and that write is only as long as it takes to shove something in a Hashtable.

ReaderWriterLockSlim on MSDN

I guess I'll throw down some code...

ReaderWriterLockSlim cacheLock = new ReaderWriterLockSlim();
Hashtable myTable = new Hashtable();
Parallel.ForEach(items, (item, loopState) =>
{
    cacheLock.EnterReadLock();
    MyObject myObj = myTable.TryGet(item.Key);
    cacheLock.ExitReadLock();

    // If the object isn't cached, calculate it and cache it
    if(myObj == null)
    {
       myObj = SomeIntensiveOperation();
       cacheLock.EnterWriteLock();
       try
       {
           myTable.Add(item.Key, myObj);
       }
       finally
       {
           cacheLock.ExitWriteLock();
       }           
    }
    // Do something with myObj
    // some code here
}

static object TryGet(this Hashtable table, object key)
{
    if(table.Contains(key))
        return table[key]
    else
        return null;
}
like image 21
joshperry Avatar answered Oct 15 '22 06:10

joshperry


I see no other correct choice than to use (more or less explicit) locks (A synchronized Hashtable just overrides all methods with locks).

Another option could be to allow the dictionary to go out of sync. The race condition will not corrupt the dictionary, it will just require the code to do some superfluous computations. Profile the code to check whether the lock or missing memoization has worse effects.

like image 21
Dario Avatar answered Oct 15 '22 04:10

Dario