Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread-safe Dictionary.Add

Is Dictionary.Add() thread safe when you only insert?

I've got a code that insert keys from multiple-threads, do I still need locking around Dictionary.Add()

I got this exception while adding a new key:

Exception Source:    mscorlib
Exception Type: System.IndexOutOfRangeException
Exception Message:   Index was outside the bounds of the array.
Exception Target Site: Insert

Although it's quite rare. I know that Dictionary is not thread-safe although I thought that only calling .Add wouldn't cause any problems.

like image 963
dr. evil Avatar asked Mar 31 '11 21:03

dr. evil


2 Answers

Dictionary is not thread-safe at all, regardless of whether you only add to it or not - there are a few internal structures to it that need to be kept in sync (especially when the internal hashbuckets get resized).

You either have to implement your own locking around any operation on it, or if you're in .Net 4.0 you can use the new ConcurrentDictionary - which is absolutely fantastic - and which is totally thread-safe.

Another option (update)

That said - there is another technique you can use - but it'll require a bit of tweaking depending upon the kind of data you're inserting into your dictionary, and whether all your keys are guaranteed unique:

Give each thread it's own private dictionary that it inserts into.

When each thread finishes, collate all the dictionaries together and merge them into a bigger one; how you handle duplicate keys is up to you. For example, if you're caching lists of items by a key, then you can simply merge each same-keyed list into one and put it in the master dictionary.

Official answer re: performance (after you accepted)

So as your comments say, you need an idea of the best method (lock or merge) for performance etc. I can't tell you what this will be; ultimately it will need to be benchmarked. I'll see if I can offer some guidance, though :)

Firstly - if you have any idea how many items your Dictionar(y/ies) will ultimately need, use the (int) constructor to minimize resizing.

The merge operation is likely to be best; since none of the threads will be interfering with each other. Unless the process involved when two objects share the same key is particularly lengthy; in which case forcing it all to happen on a single thread at the end of the operation might end up zeroing all the performance gains by parallelizing the first stage!

Equally, there's potentially memory concerns there since you will effectively be cloning the dictionary, so if the final result is large enough you could end up consuming lots of resources; granted, though - they will be released.

If it's the case that a decision needs to be made the thread-level when a key is already present, then you will need a lock(){} construct.

Over a dictionary, this typically takes the following shape:

readonly object locker = new object();
Dictionary<string, IFoo> dictionary = new Dictionary<string, IFoo>();

void threadfunc()
{
  while(work_to_do)
  {
    //get the object outside the lock
    //be optimistic - expect to add; and handle the clash as a 
    //special case
    IFoo nextObj = GetNextObject(); //let's say that an IFoo has a .Name
    IFoo existing = null;
    lock(locker)
    {
      //TryGetValue is a god-send for this kind of stuff
      if(!dictionary.TryGetValue(nextObj.Name, out existing))
        dictionary[nextObject.Name] = nextObj;
      else
        MergeOperation(existing, nextObject);
    }
  }
}

Now if that MergeOperation is really slow; then you might consider releasing the lock, creating a cloned object that represents the merge of the existing and the new object, then re-acquiring the lock. However - you need a reliable way of checking that the state of the existing object hasn't changed between the first lock and the second (a version number is useful for this).

like image 175
Andras Zoltan Avatar answered Oct 16 '22 13:10

Andras Zoltan


Yup, this is the exception you can get when you insert an element just as the dictionary is busy increasing the number of buckets. Triggered by another thread adding an item and the load factor got too high. Dictionary is especially sensitive to that because the reorganization takes a while. Good thing, makes your code crash quickly instead of only once a week.

Review every line of code that's used in a thread and check where a shared object is used. You haven't yet found the once-a-week crashes. Or worse, the ones that don't crash but just generate bad data once in a while.

like image 20
Hans Passant Avatar answered Oct 16 '22 13:10

Hans Passant