Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split an IEnumerable<T> into fixed-sized chunks (return an IEnumerable<IEnumerable<T>> where the inner sequences are of fixed length) [duplicate]

You could try to implement Batch method mentioned above on your own like this:

    static class MyLinqExtensions 
    { 
        public static IEnumerable<IEnumerable<T>> Batch<T>( 
            this IEnumerable<T> source, int batchSize) 
        { 
            using (var enumerator = source.GetEnumerator()) 
                while (enumerator.MoveNext()) 
                    yield return YieldBatchElements(enumerator, batchSize - 1); 
        } 

        private static IEnumerable<T> YieldBatchElements<T>( 
            IEnumerator<T> source, int batchSize) 
        { 
            yield return source.Current; 
            for (int i = 0; i < batchSize && source.MoveNext(); i++) 
                yield return source.Current; 
        } 
    }

I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.

UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:

public static void Main(string[] args)
{
    var xs = Enumerable.Range(1, 20);
    Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}

public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
    foreach (var batch in batches)
    {
        Console.WriteLine($"[{string.Join(", ", batch)}]");
    }
}

will output:

[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]

So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ


Maybe?

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items.Select((item, inx) => new { item, inx })
                .GroupBy(x => x.inx / partitionSize)
                .Select(g => g.Select(x => x.item));
}

There is an already implemented one too: morelinq's Batch.


It feels like you want two iterator blocks ("yield return methods"). I wrote this extension method:

static class Extensions
{
  public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
  {
    return new PartitionHelper<T>(items, partitionSize);
  }

  private sealed class PartitionHelper<T> : IEnumerable<IEnumerable<T>>
  {
    readonly IEnumerable<T> items;
    readonly int partitionSize;
    bool hasMoreItems;

    internal PartitionHelper(IEnumerable<T> i, int ps)
    {
      items = i;
      partitionSize = ps;
    }

    public IEnumerator<IEnumerable<T>> GetEnumerator()
    {
      using (var enumerator = items.GetEnumerator())
      {
        hasMoreItems = enumerator.MoveNext();
        while (hasMoreItems)
          yield return GetNextBatch(enumerator).ToList();
      }
    }

    IEnumerable<T> GetNextBatch(IEnumerator<T> enumerator)
    {
      for (int i = 0; i < partitionSize; ++i)
      {
        yield return enumerator.Current;
        hasMoreItems = enumerator.MoveNext();
        if (!hasMoreItems)
          yield break;
      }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
      return GetEnumerator();      
    }
  }
}

Craziest solution (with Reactive Extensions):

public static IEnumerable<IList<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
    return items
            .ToObservable() // Converting sequence to observable sequence
            .Buffer(partitionSize) // Splitting it on spececified "partitions"
            .ToEnumerable(); // Converting it back to ordinary sequence
}

I know that I changed signature but anyway we all know that we'll have some fixed size collection as a chunk.

BTW if you will use iterator block do not forget to split your implementation into two methods to validate arguments eagerly!


For elegant solution, You can also have a look at MoreLinq.Batch.

It batches the source sequence into sized buckets.

Example:

int[] ints = new int[] {1,2,3,4,5,6};
var batches = ints.Batch(2); // batches -> [0] : 1,2 ; [1]:3,4 ; [2] :5,6