Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once?

Tags:

Provided items is the result of a LINQ expression:

var items = from item in ItemsSource.RetrieveItems()             where ... 

Suppose generation of each item takes some non-negligeble time.

Two modes of operation are possible:

  1. Using foreach would allow to start working with items in the beginning of the collection much sooner than whose in the end become available. However if we wanted to later process the same collection again, we'll have to copy save it:

    var storedItems = new List<Item>(); foreach(var item in items) {     Process(item);     storedItems.Add(item); }  // Later foreach(var item in storedItems) {     ProcessMore(item); } 

    Because if we'd just made foreach(... in items) then ItemsSource.RetrieveItems() would get called again.

  2. We could use .ToList() right upfront, but that would force us wait for the last item to be retrieved before we could start processing the first one.

Question: Is there an IEnumerable implementation that would iterate first time like regular LINQ query result, but would materialize in process so that second foreach would iterate over stored values?

like image 613
zzandy Avatar asked Sep 14 '12 15:09

zzandy


People also ask

Can LINQ be used on IEnumerable?

All LINQ methods are extension methods to the IEnumerable<T> interface. That means that you can call any LINQ method on any object that implements IEnumerable<T> . You can even create your own classes that implement IEnumerable<T> , and those classes will instantly "inherit" all LINQ functionality!

Why does LINQ return IEnumerable?

LINQ queries return a lazily evaluated IEnumerable<T> . The query is performed upon enumeration. Even if your source IEnumerable<T> had millions of records the above query would be instantaneous. Edit: Think of LINQ queries as creating a pipeline for your result rather than imperatively creating the result.


1 Answers

A fun challenge so I have to provide my own solution. So fun in fact that my solution now is in version 3. Version 2 was a simplification I made based on feedback from Servy. I then realized that my solution had huge drawback. If the first enumeration of the cached enumerable didn't complete no caching would be done. Many LINQ extensions like First and Take will only enumerate enough of the enumerable to get the job done and I had to update to version 3 to make this work with caching.

The question is about subsequent enumerations of the enumerable which does not involve concurrent access. Nevertheless I have decided to make my solution thread safe. It adds some complexity and a bit of overhead but should allow the solution to be used in all scenarios.

public static class EnumerableExtensions {    public static IEnumerable<T> Cached<T>(this IEnumerable<T> source) {     if (source == null)       throw new ArgumentNullException("source");     return new CachedEnumerable<T>(source);   }  }  class CachedEnumerable<T> : IEnumerable<T> {    readonly Object gate = new Object();    readonly IEnumerable<T> source;    readonly List<T> cache = new List<T>();    IEnumerator<T> enumerator;    bool isCacheComplete;    public CachedEnumerable(IEnumerable<T> source) {     this.source = source;   }    public IEnumerator<T> GetEnumerator() {     lock (this.gate) {       if (this.isCacheComplete)         return this.cache.GetEnumerator();       if (this.enumerator == null)         this.enumerator = source.GetEnumerator();     }     return GetCacheBuildingEnumerator();   }    public IEnumerator<T> GetCacheBuildingEnumerator() {     var index = 0;     T item;     while (TryGetItem(index, out item)) {       yield return item;       index += 1;     }   }    bool TryGetItem(Int32 index, out T item) {     lock (this.gate) {       if (!IsItemInCache(index)) {         // The iteration may have completed while waiting for the lock.         if (this.isCacheComplete) {           item = default(T);           return false;         }         if (!this.enumerator.MoveNext()) {           item = default(T);           this.isCacheComplete = true;           this.enumerator.Dispose();           return false;         }         this.cache.Add(this.enumerator.Current);       }       item = this.cache[index];       return true;     }   }    bool IsItemInCache(Int32 index) {     return index < this.cache.Count;   }    IEnumerator IEnumerable.GetEnumerator() {     return GetEnumerator();   }  } 

The extension is used like this (sequence is an IEnumerable<T>):

var cachedSequence = sequence.Cached();  // Pulling 2 items from the sequence. foreach (var item in cachedSequence.Take(2))   // ...  // Pulling 2 items from the cache and the rest from the source. foreach (var item in cachedSequence)   // ...  // Pulling all items from the cache. foreach (var item in cachedSequence)   // ... 

There is slight leak if only part of the enumerable is enumerated (e.g. cachedSequence.Take(2).ToList(). The enumerator that is used by ToList will be disposed but the underlying source enumerator is not disposed. This is because the first 2 items are cached and the source enumerator is kept alive should requests for subsequent items be made. In that case the source enumerator is only cleaned up when eligigble for garbage Collection (which will be the same time as the possibly large cache).

like image 85
Martin Liversage Avatar answered Sep 23 '22 05:09

Martin Liversage