Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why don't the Linq extension methods sit on IEnumerator rather than IEnumerable?

Tags:

c#

.net

linq

There are lots of Linq algorithms that only need to do one pass through the input e.g. Select.

Yet all the Linq extension methods sit on IEnumerable rather than IEnumerator

    var e = new[] { 1, 2, 3, 4, 5 }.GetEnumerator(); 
    e.Select(x => x * x); // Doesn't work 

This means you can't use Linq in any situation where you are reading from an "already opened" stream.

This scenario is happening a lot for a project I am currently working on - I want to return an IEnumerator whose IDispose method will close the stream, and have all the downstream Linq code operate on this.

In short, I have an "already opened" stream of results which I can convert into an appropriately disposable IEnumerator - but unfortunately all of the downstream code requires an IEnumerable rather than an IEnumerator, even though it's only going to do one "pass".

i.e. I'm wanting to "implement" this return type on a variety of different sources (CSV files, IDataReaders, etc.):

class TabularStream 
{ 
    Column[] Columns; 
    IEnumerator<object[]> RowStream; 
}

In order to get the "Columns" I have to have already opened the CSV file, initiated the SQL query, or whatever. I can then return an "IEnumerator" whose Dispose method closes the resource - but all of the Linq operations require an IEnumerable.

The best workaround I know of is to implement an IEnumerable whose GetEnumerator() method returns the one-and-only IEnumerator and throws an error if something tries to do a GetEnumerator() call twice.

Does this all sound OK or is there a much better way for me to implement "TabularStream" in a way that's easy to use from Linq?

like image 924
Paul Hollingsworth Avatar asked Sep 16 '10 09:09

Paul Hollingsworth


2 Answers

Using IEnumerator<T> directly is rarely a good idea, in my view.

For one thing, it encodes the fact that it's destructive - whereas LINQ queries can usually be run multiple times. They're meant to be side-effect-free, whereas the act of iterating over an IEnumerator<T> is naturally side-effecting.

It also makes it virtually impossible to perform some of the optimizations in LINQ to Objects, such as using the Count property if you're actually asking an ICollection<T> for its count.

As for your workaround: yes, a OneShotEnumerable would be a reasonable approach.

like image 164
Jon Skeet Avatar answered Oct 12 '22 23:10

Jon Skeet


While I generally agree with Jon Skeet's answer, I have also come across a very few cases where working with IEnumerator indeed seemed more appropriate than wrapping them in a once-only-IEnumerable.

I'll start by illustrating one such case and by describing my own solution to the issue.

Case example: Forward-only, non-rewindable database cursors

ESRI's API for accessing geo-databases (ArcObjects) has forward-only database cursors that cannot be reset. They are essentially that API's equivalent of IEnumerator. But there is no equivalent to IEnumerable. So if you want to wrap that API in "the .NET way", you have three options (which I explored in the following order):

  1. Wrap the cursor as an IEnumerator (since that's what it really is) and work directly with that (which is cumbersome).

  2. Wrap the cursor, or the wrapping IEnumerator from (1), as a once-only IEnumerable (to make it LINQ-compatible and generally easier to work with). The mistake here is that it isn't an IEnumerable, because it cannot be enumerated more than once, and this might be overlooked by users or maintainers of your code.

  3. Don't wrap the cursor itself as an IEnumerable, but that which can be used to retrieve a cursor (e.g. the query criteria and the reference to the database object being queried). That way, several iterations are possible simply be re-executing the whole query. This is what I eventually decided on back then.

That last option is the pragmatic solution that I would generally recommend for similar cases (if applicable). If you are looking for other solutions, read on.


Re-implement LINQ query operators for the IEnumerator<T> interface?

It's technically possible to implement some or all of LINQ's query operators for the IEnumerator<T> interface. One approach would be to write a bunch of extension methods, such as:

public static IEnumerator<T> Where(this IEnumerator<T> xs, Func<T, bool> predicate)
{
    while (xs.MoveNext())
    {
        T x = xs.Current;
        if (predicate(x)) yield return x;
    }
    yield break;
}

Let's consider a few key issues:

  • Operators must never return an IEnumerable<T>, because that would mean that you can break out of your own "LINQ to IEnumerator" world and escape into regular LINQ. There you'd end up with the non-repeatability issue already described above.

  • You cannot process the results of some query with a foreach loop… unless each of the IEnumerator<T> objects returned by your query operators implements a GetEnumerator method that returns this. Supplying that additional method would mean that you cannot use yield return/break, but have to write IEnumerator<T> classes manually.

    This is just plain weird and possibly an abuse of either IEnumerator<T> or the foreach construct.

  • If returning IEnumerable<T> is forbidden and returning IEnumerator<T> is cumbersome (because foreach doesn't work), why not return plain arrays? Because then queries can no longer be lazy.


IQueryable + IEnumerator = IQueryator

What about delaying the execution of a query until it has been fully composed? In the IEnumerable world, that is what IQueryable does; so we could theoretically build an IEnumerator equivalent, which I shall call IQueryator.

  • IQueryator could check for logical errors, such as doing anything with the sequence after it has been completely consumed by a preceding operation like Count. I.e. all-consuming operators like Count would always have to be the last in a query operator concatenation.

  • IQueryator could return an array (like suggested above) or some other read-only collection, but not by the indiviual operators; only when the query gets executed.

Implementing IQueryator would take quite some time... the question is, would it actually be worth the effort?

like image 37
stakx - no longer contributing Avatar answered Oct 13 '22 00:10

stakx - no longer contributing