I created the ThreadSafeCachedEnumerable<T>
class intending to increase performance where long running queries where being reused. The idea was to get an enumerator from an IEnumerable<T>
and add items to a cache on each call to MoveNext()
. The following is my current implementation:
/// <summary>
/// Wraps an IEnumerable<T> and provides a thread-safe means of caching the values."/>
/// </summary>
/// <typeparam name="T"></typeparam>
class ThreadSafeCachedEnumerable<T> : IEnumerable<T>
{
// An enumerator from the original IEnumerable<T>
private IEnumerator<T> enumerator;
// The items we have already cached (from this.enumerator)
private IList<T> cachedItems = new List<T>();
public ThreadSafeCachedEnumerable(IEnumerable<T> enumerable)
{
this.enumerator = enumerable.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
// The index into the sequence
int currentIndex = 0;
// We will break with yield break
while (true)
{
// The currentIndex will never be decremented,
// so we can check without locking first
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// If !(currentIndex < this.cachedItems.Count),
// we need to synchronize access to this.enumerator
lock (enumerator)
{
// See if we have more cached items ...
if (currentIndex < this.cachedItems.Count)
{
var current = this.cachedItems[currentIndex];
currentIndex += 1;
yield return current;
}
else
{
// ... otherwise, we'll need to get the next item from this.enumerator.MoveNext()
if (this.enumerator.MoveNext())
{
// capture the current item and cache it, then increment the currentIndex
var current = this.enumerator.Current;
this.cachedItems.Add(current);
currentIndex += 1;
yield return current;
}
else
{
// We reached the end of the enumerator - we're done
yield break;
}
}
}
}
}
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return this.GetEnumerator();
}
}
I simply lock (this.enumerator)
when the no more items appear to be in the cache, just in case another thread is just about to add another item (I assume that calling MoveNext()
on this.enumerator from two threads is a bad idea).
The performance is great when retrieving previously cached items, but it starts to suffer when getting many items for the first time (due to the constant locking). Any suggestions for increasing the performance?
Edit: The new Reactive Framework solves the problem outlined above, using the System.Linq.EnumerableEx.MemoizeAll()
extension method.
Internally, MemoizeAll()
uses a System.Linq.EnumerableEx.MemoizeAllEnumerable<T>
(found in the System.Interactive assembly), which is similar to my ThreadSafeCachedEnumerable<T>
(sorta).
Here's an awfully contrived example that prints the contents of an Enumerable (numbers 1-10) very slowly, then quickly prints the contents a second time (because it cached the values):
// Create an Enumerable<int> containing numbers 1-10, using Thread.Sleep() to simulate work
var slowEnum = EnumerableEx.Generate(1, currentNum => (currentNum <= 10), currentNum => currentNum, previousNum => { Thread.Sleep(250); return previousNum + 1; });
// This decorates the slow enumerable with one that will cache each value.
var cachedEnum = slowEnum.MemoizeAll();
// Print the numbers
foreach (var num in cachedEnum.Repeat(2))
{
Console.WriteLine(num);
}
A couple of recommendations:
Dictionary
or HashSet
. Similarly, items may be removed inbetween calls, invalidating the cache.Locking in .NET is normally very quick (if there is no contention). Has profiling identified locking as the source of the performance problem? How long does it take to call MoveNext
on the underlying enumerator?
Additionally, the code as it stands is not thread-safe. You cannot safely call this.cachedItems[currentIndex]
on one thread (in if (currentIndex < this.cachedItems.Count)
) while invoking this.cachedItems.Add(current)
on another. From the List(T) documentation: "A List(T) can support multiple readers concurrently, as long as the collection is not modified." To be thread-safe, you would need to protect all access to this.cachedItems
with a lock (if there's any chance that one or more threads could modify it).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With