Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

Question

I have a very large enumeration and am preparing an expensive deferred operation on it (e.g. sorting it). I'm then passing this into a function which may or may not consume the IEnumerable, depending on some logic of its own.

Here's an illustration:

IEnumerable<Order> expensiveEnumerable = fullCatalog.OrderBy(c => Prioritize(c));
MaybeFullFillSomeOrders(expensiveEnumerable);

// Elsewhere... (example use-case for multiple enumerations, not real code)
void MaybeFullFillSomeOrders(IEnumerable<Order> nextUpOrders){
    if(notAGoodTime())
       return;
    foreach(var order in nextUpOrders)
       collectSomeInfo(order);
    processInfo();
    foreach(var order in nextUpOrders) {
       maybeFulfill(order);
       if(atCapacity())
          break;
    }
}

I'm would like to prepare my input to the other function such that:

If they do not consume the enumerable, the performance price of sorting is not paid.
- This already precludes calling e.g. ToList() or ToArray() on it
If they choose to enumerate multiple times (perhaps not realizing how expensive it would be in this case) I want some defence in place to prevent the multiple enumeration.
Ideally, the result is still an IEnumerable<T>

The best solution I've come up with is to use Lazy<>

var expensive = new Lazy<List<Order>>>(
    () => fullCatalog.OrderBy(c => Prioritize(c)).ToList());

This appears to satisfy criteria 1 and 2, but has a couple of drawbacks:

I have to change the interface to all downstream usages to expect a Lazy.
The full list (which in this case was built up from a SelectMany() on serveral smaller partitions) would need to be allocated as a new single contiguous list in memory. I'm not sure there's an easy way around this if I want to "cache" the sort result, but if you know of one I'm all ears.

One idea I had to solve the first problem was to wrap Lazy<> in some custom class that either implements or can implicitly be converted to an IEnumerable<T>, but I'm hoping someone knows of a more elegant approach.

Jon Skeet · Accepted Answer

You certainly could write your own IEnumerable<T> implementation that wraps another one, remembering all the elements it's already seen (and whether it's exhausted or not). If you need it to be thread-safe that becomes trickier, and you'd need to remember that at any time there may be multiple iterators working against the same IEnumerable<T>.

Fundamentally I think it would come down to working out what to do when asked for the next element (which is somewhat-annoyingly split into MoveNext() and Current, but that can probably be handled...):

If you've already read the next element within another iterator, you can yield it from your buffer
If you've already discovered that there is no next element, you can return that immediately
Otherwise, you need to ask the original iterator for the next element, and remember if for all the other wrapped iterators.

The other aspect that's tricky is knowing when to dispose of the underlying IEnumerator<T> - if you don't need to do that, it makes things simpler.

As a very sketchy attempt that I haven't even attempted to compile, and which is definitely not thread-safe, you could try something like this:

public class LazyEnumerable<T> : IEnumerable<T>
{
    private readonly IEnumerator<T> iterator;
    private List<T> buffer;
    private bool completed = false;

    public LazyEnumerable(IEnumerable<T> original)
    {
        // TODO: You could be even lazier, only calling
        // GetEnumerator when you first need an element
        iterator = original.GetEnumerator();
    }

    IEnumerator GetEnumerator() => GetEnumerator();

    public IEnumerator<T> GetEnumerator()
    {
        int index = 0;
        while (true)
        {
            // If we already have the element, yield it
            if (index < buffer.Count)
            {
                yield return buffer[index];
            }
            // If we've yielded everything in the buffer and some
            // other iterator has come to the end of the original,
            // we're done.
            else if (completed)
            {
                yield break;
            }
            // Otherwise, see if there's anything left in the original
            // iterator.
            else
            {
                bool hasNext = iterator.MoveNext();
                if (hasNext)
                {
                    var current = iterator.Current;
                    buffer.Add(current);
                    yield return current;
                }
                else
                {
                    completed = true;
                    yield break;
                }
            }
            index++;
        }
    }
}

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

Tags:

performance

c#

linq

Alain

1 Answers

Jon Skeet

Recent Activity

Donate For Us

Expensive IEnumerable: Any way to prevent multiple enumerations without forcing an immediate enumeration? [duplicate]

Tags:

performance

c#

linq

Alain

1 Answers

Jon Skeet

Related questions

Recent Activity

Donate For Us