There are lots of Linq algorithms that only need to do one pass through the input e.g. Select.
Yet all the Linq extension methods sit on IEnumerable rather than IEnumerator
var e = new[] { 1, 2, 3, 4, 5 }.GetEnumerator();
e.Select(x => x * x); // Doesn't work
This means you can't use Linq in any situation where you are reading from an "already opened" stream.
This scenario is happening a lot for a project I am currently working on - I want to return an IEnumerator whose IDispose method will close the stream, and have all the downstream Linq code operate on this.
In short, I have an "already opened" stream of results which I can convert into an appropriately disposable IEnumerator - but unfortunately all of the downstream code requires an IEnumerable rather than an IEnumerator, even though it's only going to do one "pass".
i.e. I'm wanting to "implement" this return type on a variety of different sources (CSV files, IDataReaders, etc.):
class TabularStream
{
Column[] Columns;
IEnumerator<object[]> RowStream;
}
In order to get the "Columns" I have to have already opened the CSV file, initiated the SQL query, or whatever. I can then return an "IEnumerator" whose Dispose method closes the resource - but all of the Linq operations require an IEnumerable.
The best workaround I know of is to implement an IEnumerable whose GetEnumerator() method returns the one-and-only IEnumerator and throws an error if something tries to do a GetEnumerator() call twice.
Does this all sound OK or is there a much better way for me to implement "TabularStream" in a way that's easy to use from Linq?
Using IEnumerator<T>
directly is rarely a good idea, in my view.
For one thing, it encodes the fact that it's destructive - whereas LINQ queries can usually be run multiple times. They're meant to be side-effect-free, whereas the act of iterating over an IEnumerator<T>
is naturally side-effecting.
It also makes it virtually impossible to perform some of the optimizations in LINQ to Objects, such as using the Count
property if you're actually asking an ICollection<T>
for its count.
As for your workaround: yes, a OneShotEnumerable
would be a reasonable approach.
While I generally agree with Jon Skeet's answer, I have also come across a very few cases where working with IEnumerator
indeed seemed more appropriate than wrapping them in a once-only-IEnumerable
.
I'll start by illustrating one such case and by describing my own solution to the issue.
ESRI's API for accessing geo-databases (ArcObjects) has forward-only database cursors that cannot be reset. They are essentially that API's equivalent of IEnumerator
. But there is no equivalent to IEnumerable
. So if you want to wrap that API in "the .NET way", you have three options (which I explored in the following order):
Wrap the cursor as an IEnumerator
(since that's what it really is) and work directly with that (which is cumbersome).
Wrap the cursor, or the wrapping IEnumerator
from (1), as a once-only IEnumerable
(to make it LINQ-compatible and generally easier to work with). The mistake here is that it isn't an IEnumerable
, because it cannot be enumerated more than once, and this might be overlooked by users or maintainers of your code.
Don't wrap the cursor itself as an IEnumerable
, but that which can be used to retrieve a cursor (e.g. the query criteria and the reference to the database object being queried). That way, several iterations are possible simply be re-executing the whole query. This is what I eventually decided on back then.
That last option is the pragmatic solution that I would generally recommend for similar cases (if applicable). If you are looking for other solutions, read on.
IEnumerator<T>
interface?It's technically possible to implement some or all of LINQ's query operators for the IEnumerator<T>
interface. One approach would be to write a bunch of extension methods, such as:
public static IEnumerator<T> Where(this IEnumerator<T> xs, Func<T, bool> predicate)
{
while (xs.MoveNext())
{
T x = xs.Current;
if (predicate(x)) yield return x;
}
yield break;
}
Let's consider a few key issues:
Operators must never return an IEnumerable<T>
, because that would mean that you can break out of your own "LINQ to IEnumerator
" world and escape into regular LINQ. There you'd end up with the non-repeatability issue already described above.
You cannot process the results of some query with a foreach
loop… unless each of the IEnumerator<T>
objects returned by your query operators implements a GetEnumerator
method that returns this
. Supplying that additional method would mean that you cannot use yield return/break
, but have to write IEnumerator<T>
classes manually.
This is just plain weird and possibly an abuse of either IEnumerator<T>
or the foreach
construct.
If returning IEnumerable<T>
is forbidden and returning IEnumerator<T>
is cumbersome (because foreach
doesn't work), why not return plain arrays? Because then queries can no longer be lazy.
IQueryable
+ IEnumerator
= IQueryator
What about delaying the execution of a query until it has been fully composed? In the IEnumerable
world, that is what IQueryable
does; so we could theoretically build an IEnumerator
equivalent, which I shall call IQueryator
.
IQueryator
could check for logical errors, such as doing anything with the sequence after it has been completely consumed by a preceding operation like Count
. I.e. all-consuming operators like Count
would always have to be the last in a query operator concatenation.
IQueryator
could return an array (like suggested above) or some other read-only collection, but not by the indiviual operators; only when the query gets executed.
Implementing IQueryator
would take quite some time... the question is, would it actually be worth the effort?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With