Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance regarding cached thread-safe IEnumerable<T> implementation

I created the ThreadSafeCachedEnumerable<T> class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T> and add items to a cache on each call to MoveNext(). The following is my current implementation:

/// <summary>
/// Wraps an IEnumerable&lt;T&gt; and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
    // An enumerator from the original IEnumerable<T>
    private IEnumerator<T> enumerator;

    // The items we have already cached (from this.enumerator)
    private IList<T> cachedItems = new List<T>();

    public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
    {
        this.enumerator = enumerable.GetEnumerator();
    }

    public IEnumerator<T> GetEnumerator()
    {
        // The index into the sequence
        int currentIndex = 0;

        // We will break with yield break 
        while (true)
        {
            // The currentIndex will never be decremented,
            // so we can check without locking first
            if (currentIndex < this.cachedItems.Count)
            {
                var current = this.cachedItems[currentIndex];
                currentIndex += 1;
                yield return current;
            }
            else
            {
                // If !(currentIndex < this.cachedItems.Count),
                // we need to synchronize access to this.enumerator
                lock (enumerator)
                {
                    // See if we have more cached items ...
                    if (currentIndex < this.cachedItems.Count)
                    {
                        var current = this.cachedItems[currentIndex];
                        currentIndex += 1;
                        yield return current;
                    }
                    else
                    {
                        // ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
                        if (this.enumerator.MoveNext())
                        {
                            // capture the current item and cache it, then increment the currentIndex
                            var current = this.enumerator.Current;
                            this.cachedItems.Add(current);
                            currentIndex += 1;
                            yield return current;
                        }
                        else
                        {
                            // We reached the end of the enumerator - we're done
                            yield break;
                        }
                    }
                }
            }
        }
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }
}

I simply lock (this.enumerator) when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext() on this.enumerator from two threads is a bad idea).

The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?


Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll() extension method.

Internally, MemoizeAll() uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T> (found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T> (sorta).

Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):

// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });

// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();

// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
    Console.WriteLine(num);
}
like image 755
Charles Avatar asked Dec 30 '22 20:12

Charles


2 Answers

A couple of recommendations:

  1. It is now generally accepted practice not to make container classes responsible for locking. Someone calling your cached enumerator, for instance, might also want to prevent new entries from being added to the container while enumerating, which means that locking would occur twice. Therefore, it's best to defer that responsibility to the caller.
  2. Your caching depends on the enumerator always returning items in-order, which is not guaranteed. It's better to use a Dictionary or HashSet. Similarly, items may be removed inbetween calls, invalidating the cache.
  3. It is generally not recommended to establish locks on publically accessible objects. That includes the wrapped enumerator. Exceptions are conceivable, for example when you're absolutely certain you're absolutely certain you're the only instance holding a reference to the container class you're enumerating over. This would also largely invalidate my objections under #2.
like image 182
Michiel Buddingh Avatar answered Jan 25 '23 12:01

Michiel Buddingh


Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext on the underlying enumerator?

Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex] on one thread (in if (currentIndex < this.cachedItems.Count)) while invoking this.cachedItems.Add(current) on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems with a lock (if there's any chance that one or more threads could modify it).

like image 32
Bradley Grainger Avatar answered Jan 25 '23 11:01

Bradley Grainger