Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lazily partition sequence with LINQ

Tags:

c#

linq

I have the following extension method to find an element within a sequence, and then return two IEnumerable<T>s: one containing all the elements before that element, and one containing the element and everything that follows. I would prefer if the method were lazy, but I haven't figured out a way to do that. Can anyone come up with a solution?

public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
    var a = sequence.ToArray();
    return new PartitionTuple<T>
    {
        Before = a.TakeWhile(v => !partition(v)),
        After = a.SkipWhile(v => !partition(v))
    };
}

Doing sequence.ToArray() immediately defeats the laziness requirement. However, without that line, an expensive-to-iterate sequence may be iterated over twice. And, depending on what the calling code does, many more times.

like image 681
moswald Avatar asked Nov 14 '13 15:11

moswald


3 Answers

You can use the Lazy object to ensure that the source sequence isn't converted to an array until one of the two partitions is iterated:

public static PartitionTuple<T> Partition<T>(
    this IEnumerable<T> sequence, Func<T, bool> partition)
{
    var lazy = new Lazy<IEnumerable<T>>(() => sequence.ToArray());
    return new PartitionTuple<T>
    {
        Before = lazy.MapLazySequence(s => s.TakeWhile(v => !partition(v))),
        After = lazy.MapLazySequence(s => s.SkipWhile(v => !partition(v)))
    };
}

We'll use this method to defer evaluating the lazy until the sequence itself is iterated:

public static IEnumerable<TResult> MapLazySequence<TSource, TResult>(
    this Lazy<IEnumerable<TSource>> lazy, 
    Func<IEnumerable<TSource>, IEnumerable<TResult>> filter)
{
    foreach (var item in filter(lazy.Value))
        yield return item;
}
like image 138
Servy Avatar answered Oct 23 '22 04:10

Servy


This is an interesting problem and to get it right, you have to know what "right" is. For the semantics of the operation, I think that this definition makes sense:

  • The source sequence is only enumerated once even though the resulting sequences are enumerated several times.
  • The source sequence isn't enumerated until one of the results is enumerated.
  • Each of the results should be possible to enumerate independently.
  • If the source sequence changes, it is undefined what will happen.

I'm not entirely sure I got the handling of the matching object right, but I hope you get the idea. I'm deferring a lot of the work to the PartitionTuple<T> class to be able to be lazy.

public class PartitionTuple<T>
{
  IEnumerable<T> source;
  IList<T> before, after;
  Func<T, bool> partition;

  public PartitionTuple(IEnumerable<T> source, Func<T, bool> partition)
  {
    this.source = source;
    this.partition = partition;
  }

  private void EnsureMaterialized()
  {
    if(before == null)
    {
      before = new List<T>();
      after = new List<T>();

      using(var enumerator = source.GetEnumerator())
      {
        while(enumerator.MoveNext() && !partition(enumerator.Current))
        {
          before.Add(enumerator.Current);   
        }

        while(!partition(enumerator.Current) && enumerator.MoveNext());

        while(enumerator.MoveNext())
        {
          after.Add(enumerator.Current);
        }
      }
    }
  }

  public IEnumerable<T> Before 
  { 
    get
    {
      EnsureMaterialized();
      return before;
    }
  }

  public IEnumerable<T> After
  {
    get
    {
      EnsureMaterialized();
      return after;
    }
  }
}

public static class Extensions
{
  public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
  {
    return new PartitionTuple<T>(sequence, partition);
  }
}
like image 39
Anders Abel Avatar answered Oct 23 '22 03:10

Anders Abel


Here's a generic solution that will memoize any IEnumerable<T> to ensure it's only iterated once, without forcing the whole thing to iterate:

public class MemoizedEnumerable<T> : IEnumerable<T>, IDisposable
{
   private readonly IEnumerator<T> _childEnumerator;
   private readonly List<T> _itemCache = new List<T>();

   public MemoizedEnumerable(IEnumerable<T> enumerableToMemoize)
   {
       _childEnumerator = enumerableToMemoize.GetEnumerator();
   }

   public IEnumerator<T> GetEnumerator()
   {
       return _itemCache.Concat(EnumerateOnce()).GetEnumerator();
   }

   public void Dispose()
   {
       _childEnumerator.Dispose();
   }

   private IEnumerable<T> EnumerateOnce()
   {
       while (_childEnumerator.MoveNext())
       {
           _itemCache.Add(_childEnumerator.Current);
           yield return _childEnumerator.Current;
       }
   }

   IEnumerator IEnumerable.GetEnumerator()
   {
       return GetEnumerator();
   }
}

public static class EnumerableExtensions
{
    public static IEnumerable<T> Memoize<T>(this IEnumerable<T> enumerable)
    {
        return new MemoizedEnumerable<T>(enumerable);
    }
}

To use it for your partitioning problem, do this:

var memoized = sequence.Memoize();
return new PartitionTuple<T>
{
    Before = memoized.TakeWhile(v => !partition(v)),
    After = memoized.SkipWhile(v => !partition(v))
};

This will only iterate sequence a maximum of one time.

like image 44
Jacob Avatar answered Oct 23 '22 02:10

Jacob