Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching IEnumerable

Tags:

public IEnumerable<ModuleData> ListModules()
{
    foreach (XElement m in Source.Descendants("Module"))
    {
        yield return new ModuleData(m.Element("ModuleID").Value);
    }
}

Initially the above code is great since there is no need to evaluate the entire collection if it is not needed.

However, once all the Modules have been enumerated once, it becomes more expensive to repeatedly query the XDocument when there is no change.

So, as a performance improvement:

public IEnumerable<ModuleData> ListModules()
{
    if (Modules == null)
    {
        Modules = new List<ModuleData>();
        foreach (XElement m in Source.Descendants("Module"))
        {
            Modules.Add(new ModuleData(m.Element("ModuleID").Value, 1, 1));
        }
    }
    return Modules;
}

Which is great if I am repeatedly using the entire list but not so great otherwise.

Is there a middle ground where I can yield return until the entire list has been iterated, then cache it and serve the cache to subsequent requests?

like image 667
djskinner Avatar asked Oct 08 '09 10:10

djskinner


3 Answers

You can look at Saving the State of Enumerators which describes how to create lazy list (which caches once iterated items).

like image 143
Dzmitry Huba Avatar answered Oct 01 '22 18:10

Dzmitry Huba


Check out MemoizeAll() in the Reactive Extensions for .NET library (Rx). As it is evaluated lazily you can safely set it up during construction and just return Modules from ListModules():

Modules = Source.
    Descendants("Module").
    Select(m => new ModuleData(m.Element("ModuleID").Value, 1, 1)).
    MemoizeAll();

There's a good explanation of MemoizeAll() (and some of the other less obvious Rx extensions) here.

like image 40
GraemeF Avatar answered Oct 01 '22 18:10

GraemeF


I like @tsemer's answer. But I would like to propose my solutions, which has nothing to do with FP. It's naive approach, but it generates a lot less of allocations. And it is not thread safe.

public class CachedEnumerable<T> : IEnumerable<T>, IDisposable
{
    IEnumerator<T> _enumerator;
    readonly List<T> _cache = new List<T>();

    public CachedEnumerable(IEnumerable<T> enumerable) 
        : this(enumerable.GetEnumerator())
    {
    }

    public CachedEnumerable(IEnumerator<T> enumerator)
    {
        _enumerator = enumerator;
    }

    public IEnumerator<T> GetEnumerator()
    {
        // The index of the current item in the cache.
        int index = 0;

        // Enumerate the _cache first
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }

        // Continue enumeration of the original _enumerator, 
        // until it is finished. 
        // This adds items to the cache and increment 
        for (; _enumerator != null && _enumerator.MoveNext(); index++)
        {
            var current = _enumerator.Current;
            _cache.Add(current);
            yield return current;
        }

        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }

        // Some other users of the same instance of CachedEnumerable
        // can add more items to the cache, 
        // so we need to enumerate them as well
        for (; index < _cache.Count; index++)
        {
            yield return _cache[index];
        }
    }

    public void Dispose()
    {
        if (_enumerator != null)
        {
            _enumerator.Dispose();
            _enumerator = null;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

This is how the matrix test from @tsemer's answer will work:

var ints = new [] { 1, 2, 3, 4, 5 };
var cachedEnumerable = new CachedEnumerable<int>(ints); 
foreach (var x in cachedEnumerable)
{
    foreach (var y in cachedEnumerable)
    {
        //Do something
    }
}
  1. The outer loop (x) skips first for, because _cache is empty;
  2. x fetches one item from the _enumerator to the _cache;
  3. x pauses before second for loop;
  4. The inner loop (y) enumerates one element from the _cache;
  5. y fetches all elements from the _enumerator to the _cache;
  6. y skips the third for loop, because its index variable equals 5;
  7. x resumes, its index equals 1. It skips the second for loop because _enumerator is finished;
  8. x enumerates one element from the _cache using the third for loop;
  9. x pauses before the third for;
  10. y enumerates 5 elements from the _cache using first for loop;
  11. y skips the second for loop, because _enumerator is finished;
  12. y skips the third for loop, because index of y equals 5;
  13. x resumes, increments index. It fetches one element from the _cache using the third for loop.
  14. x pauses.
  15. if index variable of x is less than 5 then go to 10;
  16. end.
like image 36
hazzik Avatar answered Oct 01 '22 17:10

hazzik