Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calling ToList() on ConcurrentDictionary<TKey, TValue> while adding items

I've run into an interesting issue. Knowing that the ConcurrentDictionary<TKey, TValue> is safely enumerable while being modified, with the (in my case) unwanted side-effect of iterating over elements that may disappear or appear multiple times, I decided to create a snapshot myself, using ToList(). Since ConcurrentDictionary<TKey, TValue> also implements ICollection<KeyValuePair<TKey, TValue>>, this causes the List(IEnumerable<T> collection) to be used, which in turn creates an array in the current size of the dictionary using the current item Count, then attempts to copy over the items using ICollection<T>.CopyTo(T[] array, int arrayIndex), calling into its ConcurrentDictionary<TKey, TValue> implementation, and finally throwing an ArgumentException if elements are added to the dictionary in the meantime.

Locking all over would kill the point of using the collection as it is, so my options seem to be to either keep catching the exception and retrying (which is definitely not the right answer to the problem), or to implement my own version of ToList() specialized for this issue (but then again, simply growing a list then possibly trimming it to the right size for a few elements seems like an overkill, and using a LinkedList would decrease indexing performance).

In addition, it seems like adding certain LINQ methods that create some sort of a buffer in the background (such as OrderBy) do seem to mend the problem at the cost of performance, but the bare ToList() obviously does not, and it's not worth "augmenting" it with another method when no additional functionality is needed.

Could this be an issue with any concurrent collection?

What would be a reasonable workaround to keep performance hits to the minimum while creating such a snapshot? (Preferably at the end of some LINQ magic.)

Edit:

After looking into it I can confirm, ToArray() (to think that I just passed by it yesterday) really does solve the snapshot problem as long as it's just that, a simple snapshot, it does not help when additional functionality is required before taking said snapshot (such as filtering, sorting), and a list/array is still needed at the end. (In this case, an additional call is required, creating the new collection all over again.)

I failed to point out that the snapshot may or may not need to go through these modifications, so it should be taken at the end, preferably, so I'd add this to the questions.

(Also, if anyone has a better idea for a title, do tell.)

like image 997
Roland Szakacs Avatar asked Dec 08 '16 11:12

Roland Szakacs


People also ask

What is ConcurrentDictionary in c#?

ConcurrentDictionary is a generic collection, ConcurrentDictionary was introduced in . NET framework 4.0 as it is available in System. Collections. Concurrent namespace, this generic collection is used in the case of a multi-threaded application.

What is the purpose of Concurrent Dictionary TKey TValue Class?

Represents a thread-safe collection of key/value pairs that can be accessed by multiple threads concurrently.

What is the purpose of the ConcurrentDictionary TKey TValue class in C#?

ConcurrentDictionary<TKey, TValue> Class Represents a thread-safe collection of key-value pairs that can be accessed by multiple threads concurrently.

Is Concurrent Dictionary thread-safe?

Concurrent. ConcurrentDictionary<TKey,TValue>. This collection class is a thread-safe implementation. We recommend that you use it whenever multiple threads might be attempting to access the elements concurrently.


1 Answers

Let's answer the broad over-shadowing question here for all the concurrent types:

If you split up an operation that deals with the internals in multiple steps, where all the steps must "be in sync", then yes, definitively you will get crashes and odd results due to thread synchronization.

So if using .ToList() will first ask for .Count, then size an array, and then use foreach to grab the values and place in the list, then yes, definitively you will have the chance of the two parts getting a different number of elements.

To be honest I wish some of those concurrent types did not try to pretend they were normal collections by implementing a lot of those interfaces but alas, that's how it is.

Can you fix your code, now that you know about the issue?

Yes you can, you must take a look at the type documentation and see if it provides any form of snapshotting mechanism that isn't prone to the above mentioned problems.

Turns out ConcurrentDictionary<TKey, TValue> implements .ToArray(), which is documented with:

A new array containing a snapshot of key and value pairs copied from the System.Collections.Concurrent.ConcurrentDictionary.

(my emphasis)

How is .ToArray() currently implemented?

Using locks, see line 697.

So if you feel locking the entire dictionary to get a snapshot is too costly I would question the act of grabbing a snapshot of its contents to begin with.

Additionally, the .GetEnumerator() method follows some of the same rules, from the documentation:

The enumerator returned from the dictionary is safe to use concurrently with reads and writes to the dictionary, however it does not represent a moment-in-time snapshot of the dictionary. The contents exposed through the enumerator may contain modifications made to the dictionary after GetEnumerator was called.

(again, my emhpasis)

So while .GetEnumerator() won't crash, it may not produce the results you want.

Depending on timing, neither may .ToArray(), so it all depends.

like image 69
Lasse V. Karlsen Avatar answered Sep 22 '22 18:09

Lasse V. Karlsen