Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Locking HashSet for concurrency [duplicate]

Tags:

c#

.net

When using a HashSet<string> to check, whether an item was handled before (i.e. only Add and Contains is used). Furthermore it is not relevant, when Contains returns false, even though it was added before ...

I encountered following exception without locking:

[IndexOutOfRangeException: Index was outside the bounds of the array.] System.Collections.Generic.HashSet`1.AddIfNotPresent(T value) +6108128

Is it sufficient, to lock only the Add call?

Following seems to work forever - but that is not a proof...

HashSet<string> hashSet = new HashSet<string>();
Parallel.ForEach(GetString(), h => 
{
    hashSet.Contains(h);
    lock(hashSetLock) 
    {
        hashSet.Add(h); 
    }
    hashSet.Contains(h);
});

To make it precise: I know that it is not thread-safe to call Contains without a lock. My question is (accepting false positives) if the above code could throw an exception or could destroy the internal state of the underlying data structure (=HashSet).

like image 816
Markus Avatar asked Dec 14 '22 11:12

Markus


2 Answers

No, it is not sufficient to lock only on Add.

The fact that it doesn't crash only tells you that it didn't crash during your test.

You cannot guarantee that:

  • It won't crash in the future
  • It will produce the correct results

A non-threadsafe data structure has no guarantees whatsoever if used in a multithreaded fashion.

You need to either:

  • Lock on every call to it
  • Use a threadsafe data structure, one that has been built to support this scenario

If you use a different data structure than a hashset, like a dictionary, you may even need to lock multi-statements, because this may still fail:

lock (dLock)
    if (d.ContainsKey("test"))
        return;

var value = ExpensiveCallToObtainValue();
lock (dLock)
    d.Add("test", value);

Between the call to ContainsKey and the call to Add another thread may have already inserted that key.

To handle this correctly, without using a threadsafe data structure, is contain both operations inside the same lock:

lock (dLock)
{
    if (!d.ContainsKey("test"))
        d.Add("test", ExpensiveCallToObtainValue());
}
like image 111
Lasse V. Karlsen Avatar answered Dec 17 '22 00:12

Lasse V. Karlsen


No, as others stated, it isn't thread-safe to do what you're doing. If the underlying collection isn't thread-safe, you will need to lock on every operation.

When using a HashSet<T>, there need not be a ContainsKey check, as Add will check if the internal collection already contains the value or not:

Return Value Type: System.Boolean

true if the element is added to the HashSet object; false if the element is already present.

So you can narrow your code to:

private readonly object syncRoot = new object();
lock (syncRoot)
    hashSet.Add(value);
like image 40
Yuval Itzchakov Avatar answered Dec 17 '22 00:12

Yuval Itzchakov