Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is `.Select(...).Last()` optimized, but `.Select(...).Last(...)` not?

Tags:

c#

.net

linq

Consider the following enumerator:

var items = (new int[] { 1, 2, 3, 4, 5 }).Select(x =>
{
    Console.WriteLine($"inspect {x}");
    return x;
});

This yields the elements [1, 2, 3, 4, 5], printing them as they are consumed.

When I call the Last method on this enumerator, it triggers a fast path which only accesses a single element:

items.Last();
inspect 5

But when I pass a callback to Last, it loops through the whole list from the beginning:

items.Last(x => true);
inspect 1
inspect 2
inspect 3
inspect 4
inspect 5

Looking through the .NET Core source code, I find that:

  • Last(IEnumerable<T>) forwards to TryGetLast(IEnumerable<T>, out bool);
  • TryGetLast(IEnumerable<T>, out bool) has a fast path for IPartition<T>;
  • And since ArraySelectIterator<T> implements IPartition<T>, this fast path is triggered and all is well.

On the other hand:

  • Last(IEnumerable<T>, Func<T, bool>) forwards to TryGetLast(IEnumerable<T>, Func<T, bool>, out bool)
  • This has fast paths for OrderedEnumerator and IList<T>, but not ArraySelectIterator<T>.
  • Therefore it takes the slow path and iterates from the beginning.

This explains how the callback case is not optimized. But it doesn't explain why.

Conceptually, if at least one element satisfies the predicate (which is likely in practice), then iterating backward may allow for exiting the loop early.

It doesn't seem difficult to implement either: from what I've seen, all it takes is an additional method on IPartition<T>.

The lack of optimization can also be surprising. Since these overloads share the same name, one might assume that they are also optimized in a similar way. (At least that's what I thought.)

Given these reasons to optimize this case, why did the authors of LINQ choose not to do that?

like image 948
Lambda Fairy Avatar asked Jan 04 '19 08:01

Lambda Fairy


1 Answers

Last() can be always optimized for collections that allow access to the last element of the collection in constant time (O(1)). For those collections, instead of iterating all the collection and returning the last element, you can just access the last element directly.

Conceptually, if at least one element satisfies the predicate (which is likely in practice), then iterating backward may allow for exiting the loop early.

That assumption is way too far fetched for a generic implementation of Last(Func<T,bool>). You can't assume that the last element that satisfies the predicate is closer to the end of the collection in general. That optimization works well for your example (Last(x=>true)), but for every such example there can be an oposite example where that optimization performs worse (Last(x=>false)).

like image 128
Monticola Explorator Avatar answered Sep 30 '22 18:09

Monticola Explorator