Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is C# LINQ OrderBy threadsafe when used with ConcurrentDictionary<Tkey, TValue>?

My working assumption is that LINQ is thread-safe when used with the System.Collections.Concurrent collections (including ConcurrentDictionary).

(Other Overflow posts seem to agree: link)

However, an inspection of the implementation of the LINQ OrderBy extension method shows that it appears not to be threadsafe with the subset of concurrent collections which implement ICollection (e.g. ConcurrentDictionary).

The OrderedEnumerable GetEnumerator (source here) constructs an instance of a Buffer struct (source here) which tries to cast the collection to an ICollection (which ConcurrentDictionary implements) and then performs a collection.CopyTo with an array initialised to the size of the collection.

Therefore, if the ConcurrentDictionary (as the concrete ICollection in this case) grows in size during the OrderBy operation, between initialising the array and copying into it, this operation will throw.

The following test code shows this exception:

(Note: I appreciate that performing an OrderBy on a thread-safe collection which is changing underneath you is not that meaningful, but I do not believe it should throw)

using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace Program
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                int loop = 0;
                while (true) //Run many loops until exception thrown
                {
                    Console.WriteLine($"Loop: {++loop}");

                    _DoConcurrentDictionaryWork().Wait();
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex);
            }
        }

        private static async Task _DoConcurrentDictionaryWork()
        {
            var concurrentDictionary = new ConcurrentDictionary<int, object>();
            var keyGenerator = new Random();
            var tokenSource = new CancellationTokenSource();

            var orderByTaskLoop = Task.Run(() =>
            {
                var token = tokenSource.Token;
                while (token.IsCancellationRequested == false)
                {
                    //Keep ordering concurrent dictionary on a loop
                    var orderedPairs = concurrentDictionary.OrderBy(x => x.Key).ToArray(); //THROWS EXCEPTION HERE

                    //...do some more work with ordered snapshot...
                }
            });

            var updateDictTaskLoop = Task.Run(() =>
            {
                var token = tokenSource.Token;
                while (token.IsCancellationRequested == false)
                {
                    //keep mutating dictionary on a loop
                    var key = keyGenerator.Next(0, 1000);
                    concurrentDictionary[key] = new object();
                }
            });

            //Wait for 1 second
            await Task.Delay(TimeSpan.FromSeconds(1));

            //Cancel and dispose token
            tokenSource.Cancel();
            tokenSource.Dispose();

            //Wait for orderBy and update loops to finish (now token cancelled)
            await Task.WhenAll(orderByTaskLoop, updateDictTaskLoop);
        }
    }
}

That the OrderBy throws an exception leads to one of a few possible conclusions:

1) My assumption about LINQ being threadsafe with concurrent collections is incorrect, and it is only safe to perform LINQ on collections (be they concurrent or not) which are not mutating during the LINQ query

2) There is a bug with the implementation of LINQ OrderBy and it is incorrect for the implementation to try and cast the source collection to an ICollection and try and perform the collection copy (and It should just drop through to its default behaviour iterating the IEnumerable).

3) I have misunderstood what is going on here...

Thoughts much appreciated!

like image 821
Benjamin Osborne Avatar asked Dec 04 '17 10:12

Benjamin Osborne


1 Answers

It's not stated anywhere that OrderBy (or other LINQ methods) should always use GetEnumerator of source IEnumerable or that it should be thread safe on concurrent collections. All that is promised is this method

Sorts the elements of a sequence in ascending order according to a key.

ConcurrentDictionary is not thread-safe in some global sense either. It's thread-safe with respect to other operations performed on it. Even more, documentation says that

All public and protected members of ConcurrentDictionary are thread-safe and may be used concurrently from multiple threads. However, members accessed through one of the interfaces the ConcurrentDictionary implements, including extension methods, are not guaranteed to be thread safe and may need to be synchronized by the caller.

So, your understanding is correct (OrderBy will see IEnumerable you pass to it is really ICollection, will then get length of that collection, allocate buffer of that size, then will call ICollection.CopyTo, and this is of course not thread safe on any type of collection), but it's not a bug in OrderBy because neither OrderBy nor ConcurrentDictionary ever promised what you assume.

If you want to do OrderBy in a thread safe way on ConcurrentDictionary, you need to rely on methods that are promised to be thread safe. For example:

// note: this is NOT IEnumerable.ToArray()
// but public ToArray() method of ConcurrentDictionary itself
// it is guaranteed to be thread safe with respect to other operations
// on this dictionary
var snapshot = concurrentDictionary.ToArray();
// we are working on snapshot so no one other thread can modify it
// of course at this point real contents of dictionary might not be
// the same as our snapshot
var sorted = snapshot.OrderBy(c => c.Key);

If you don't want to allocate additional array (with ToArray), you can use Select(c => c) and it will work in this case, but then we are again in moot territory and relying on something to be safe to use in situation it was not promised to (Select will also not always enumerate your collection. If collection is array or list - it will shortcut and use indexers instead). So you can create extension method like this:

public static class Extensions {
    public static IEnumerable<T> ForceEnumerate<T>(this ICollection<T> collection) {
        foreach (var item in collection)
            yield return item;
    }
}

And use it like this if you want to be safe and don't want to allocate array:

concurrentDictionary.ForceEnumerate().OrderBy(c => c.Key).ToArray();

In this case we are forcing enumeration of ConcurrentDictionary (which we know is safe from documentation) and then pass that to OrderBy knowing that it cannot do any harm with that pure IEnumerable. Note that as correctly pointed out in comments by mjwills, this is not exactly the same as ToArray, because ToArray produces snapshot (locks collection preventing modifications while building array) and Select \ yield does not acquire any locks (so items might be added\removed right when enumeration is in progress). Though I doubt it matters when doing things like described in question - in both cases after OrderBy is completed - you have no idea whether your ordered results reflect current state of collection or not.

like image 173
Evk Avatar answered Sep 17 '22 10:09

Evk