What is the purpose of using Select(x = x) in a Batch method?

Question

I was looking at the source code of Batch method and I have seen this:

// Select is necessary so bucket contents are streamed too
yield return resultSelector(bucket.Select(x => x));

There is a comment which I didn't quite understand. I have tested this method without using Select and it worked well. But it seems there is something I'm missing.I can't think of any example where this would be necessary, So what's the actual purpose of using Select(x => x) here ?

Here is the full source code for reference:

private static IEnumerable<TResult> BatchImpl<TSource, TResult>(
        this IEnumerable<TSource> source,
        int size,
        Func<IEnumerable<TSource>, TResult> resultSelector)
    {
        TSource[] bucket = null;
        var count = 0;

        foreach (var item in source)
        {
            if (bucket == null)
                bucket = new TSource[size];

            bucket[count++] = item;

            // The bucket is fully buffered before it's yielded
            if (count != size)
                continue;

            // Select is necessary so bucket contents are streamed too
            yield return resultSelector(bucket.Select(x => x));

            bucket = null;
            count = 0;
        }

        // Return the last bucket with all remaining elements
        if (bucket != null && count > 0)
            yield return resultSelector(bucket.Take(count));
    }

BartoszKP · Accepted Answer

To sum up what's in the comments, theoretically this is redundant. Deferred execution is irrelevant in this case. At the point of yield full execution has already been made: contents of bucket are already calculated and there is nothing to defer.

There is also no problem caused by the iterator block behaviour - each time we're back in this implementation the bucket is being reset and recreated (bucket = null immediately after yield). Even if someone would cast the result to the array type and modify it, we don't care.

An advantage of this approach seems to be only elegance: there is a type consistency between all calls to resultSelector. Without the "redundant" Select, the actual type would have been TSource[] most of the time, and IEnumerable<TSource> for the trailing elements that did not fill the whole bucket.

However, one can imagine the following scenario:

someone using this function notices that the actual type is an array
because of some urge need to improve performance, they cast the received batch to TSource[] (e.g. they can now skip elements more efficiently, as Skip is not optimized for arrays)
they use the method without any problems, because it happens that Count() % size == 0 in their case

Until, later, it happens that one additional elements pops in, causing the last yield to be executed. And now the cast to TSource[] will fail.

So, depending on the number of elements and size the method would behave inconsistently with regard to its result type (passed to the given callback). One can imagine other elaborate scenarios where this inconsistency can cause trouble, like some ORM that, depending on the actual type, serializes objects into different tables. In this context pieces of data would end up in different tables.

These scenarios are of course all based on some other mistake being made, and do not prove that without the Select the implementation is wrong. It is however more friendly with the Select, in a sense, that it reduces the number of such unfortunate scenarios to a minimum.

What is the purpose of using Select(x => x) in a Batch method?

Tags:

c#

linq

Selman Genç

1 Answers

BartoszKP

Recent Activity

Donate For Us