Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zip N IEnumerable<T>s together? Iterate over them simultaneously?

Tags:

c#

.net

linq

I have:-

IEnumerable<IEnumerable<T>> items;

and I'd like to create:-

IEnumerable<IEnumerable<T>> results;

where the first item in "results" is an IEnumerable of the first item of each of the IEnumerables of "items", the second item in "results" is an IEnumerable of the second item of each of "items", etc.

The IEnumerables aren't necessarily the same lengths. If some of the IEnumerables in items don't have an element at a particular index, then I'd expect the matching IEnumerable in results to have fewer items in it.

For example:-

items = { "1", "2", "3", "4" } , { "a", "b", "c" };
results = { "1", "a" } , { "2", "b" }, { "3", "c" }, { "4" };

Edit: Another example (requested in comments):-

items = { "1", "2", "3", "4" } , { "a", "b", "c" }, { "p", "q", "r", "s", "t" };
results = { "1", "a", "p" } , { "2", "b", "q" }, { "3", "c", "r" }, { "4", "s" }, { "t" };

I don't know in advance how many sequences there are, nor how many elements are in each sequence. I might have 1,000 sequences with 1,000,000 elements in each, and I might only need the first ~10, so I'd like to use the (lazy) enumeration of the source sequences if I can. In particular I don't want to create a new data structure if I can help it.

Is there a built-in method (similar to IEnumerable.Zip) that can do this?

Is there another way?

like image 418
Iain Galloway Avatar asked Oct 21 '10 15:10

Iain Galloway


3 Answers

Now lightly tested and with working disposal.

public static class Extensions
{
  public static IEnumerable<IEnumerable<T>> JaggedPivot<T>(
    this IEnumerable<IEnumerable<T>> source)
  {
    List<IEnumerator<T>> originalEnumerators = source
      .Select(x => x.GetEnumerator())
      .ToList();

    try
    {
      List<IEnumerator<T>> enumerators = originalEnumerators
        .Where(x => x.MoveNext()).ToList();

      while (enumerators.Any())
      {
        List<T> result = enumerators.Select(x => x.Current).ToList();
        yield return result;
        enumerators = enumerators.Where(x => x.MoveNext()).ToList();
      }
    }
    finally
    {
      originalEnumerators.ForEach(x => x.Dispose());
    }
  } 
}

public class TestExtensions
{
  public void Test1()
  {
    IEnumerable<IEnumerable<int>> myInts = new List<IEnumerable<int>>()
    {
      Enumerable.Range(1, 20).ToList(),
      Enumerable.Range(21, 5).ToList(),
      Enumerable.Range(26, 15).ToList()
    };

    foreach(IEnumerable<int> x in myInts.JaggedPivot().Take(10))
    {
      foreach(int i in x)
      {
        Console.Write("{0} ", i);
      }
      Console.WriteLine();
    }
  }
}
like image 127
Amy B Avatar answered Oct 15 '22 09:10

Amy B


It's reasonably straightforward to do if you can guarantee how the results are going to be used. However, if the results might be used in an arbitrary order, you may need to buffer everything. Consider this:

var results = MethodToBeImplemented(sequences);
var iterator = results.GetEnumerator();
iterator.MoveNext();
var first = iterator.Current;
iterator.MoveNext();
var second = iterator.Current;
foreach (var x in second)
{
    // Do something
}
foreach (var x in first)
{
    // Do something
}

In order to get at the items in "second" you'll have to iterate over all of the subsequences, past the first items. If you then want it to be valid to iterate over the items in first you either need to remember the items or be prepared to re-evaluate the subsequences.

Likewise you'll either need to buffer the subsequences as IEnumerable<T> values or reread the whole lot each time.

Basically it's a whole can of worms which is difficult to do elegantly in a way which will work pleasantly for all situations :( If you have a specific situation in mind with appropriate constraints, we may be able to help more.

like image 20
Jon Skeet Avatar answered Oct 15 '22 09:10

Jon Skeet


Based on David B's answer, this code should perform better:

public static IEnumerable<IEnumerable<T>> JaggedPivot<T>(
    this IEnumerable<IEnumerable<T>> source)
{
    var originalEnumerators = source.Select(x => x.GetEnumerator()).ToList();
    try
    {
        var enumerators =
            new List<IEnumerator<T>>(originalEnumerators.Where(x => x.MoveNext()));

        while (enumerators.Any())
        {
            yield return enumerators.Select(x => x.Current).ToList();
            enumerators.RemoveAll(x => !x.MoveNext());
        }
    }
    finally
    {
        originalEnumerators.ForEach(x => x.Dispose());
    }
}

The difference is that the enumerators variable isn't re-created all the time.

like image 1
tm1 Avatar answered Oct 15 '22 08:10

tm1