Suppose I have some strings:
string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };
What is the difference between:
string startsWithO = strings.First(s => s[0] == 'o');
And:
string startsWithO = strings.Where(s => s[0] == 'o').First();
Since Where() is deferred it shouldn't slow down the execution, right?
The performance penalty of using .Where(filter).First()
rather than .First(filter)
will usually be very small.
However, they're not the same - Where
generates a new iterator that First
can simply take one element of, whereas First(filter)
can microoptimize by using just one loop and a direct return whenever filter
matches.
So while both approaches have the same semantics and both execute the filter
equally often (only as often as necessary), using First
with a filter
parameter doesn't need to create an intermediate iterator object and probably avoids some very simple method calls on that iterator too.
In other words, if you're executing such code millions of times, you will see a slight performance difference - but nothing huge; I would never worry about it. Whenever that tiny performance difference actually matters you're much better off just writing the (very simple) foreach-with-if statement that's equivalent and avoiding the extra calls and object allocations inherent in LINQ - but remember that this is a microoptimization you'll rarely need.
Edit: Benchmark demonstrating the effect:
This takes 0.78 seconds:
for(int i=0;i<10*1000*1000;i++)
Enumerable.Range(0,1000).First(n=> n > 2);
GC.Collect();
But this takes 1.41 seconds:
for(int i=0;i<10*1000*1000;i++)
Enumerable.Range(0,1000).Where(n=> n > 2).First();
GC.Collect();
Whereas plain loops are much faster (0.13 seconds):
long bla = 0;
for(int i=0;i<10*1000*1000;i++)
for(int n=0;n<1000;n++)
if(n > 2) { bla+=n; break; }
GC.Collect();
Console.WriteLine(bla);//avoid optimizer cheating.
Note that this benchmark only shows such extreme differences because I have a trivial filter and a very short non-matching prefix.
Based on some quick experimentation, the difference seems largely attributable to the details of which codepath gets taken. So, for array's and List<>
s the first variant is actually faster, likely to do special-casing in .Where
for those types that First
doesn't have; for custom iterators, the second version is a tiny bit faster, as expected.
.Where(...).First()
is roughly as fast as .First(...)
- don't bother choosing one or the other as an optimization. In general .First(...)
is very slightly faster but in some common cases it is slower. If you really need that microoptimization then use plain loops which are faster than either.
There are no differences here.
Calling Where first returns an iterator that's not used until First starts looping over it.
If the predicate doesn't match any elements then the same exception InvalidOperationException is thrown.
The only difference is the verbosity of the code, so .First without .Where should be preferred
In the specific case, calling First
and Where
on a string[]
, the methods called are the Enumerable.Where
and Enumerable.First
extension methods.
Enumerable.Where
does this:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
// null checks omitted
if (source is TSource[])
return new WhereArrayIterator<TSource>((TSource[])source, predicate);
//the rest of the method will not execute
}
and the constructor of WhereArrayIterator
does just:
public WhereArrayIterator(TSource[] source, Func<TSource, bool> predicate) {
this.source = source;
this.predicate = predicate;
}
So nothing is actually done here, except to create an iterator.
The first First
method, without a predicate does this:
public static TSource First<TSource>(this IEnumerable<TSource> source) {
//null check
IList<TSource> list = source as IList<TSource>;
if (list != null) {
//this branch is not taken as string[] does not implement IList<string>
if (list.Count > 0) return list[0];
}
else {
//this is actually the WhereArrayIterator from before
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext())
return e.Current;
}
}
throw Error.NoElements();
}
However, the second First
does this
public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
//null checks
foreach (TSource element in source) {
if (predicate(element)) return element;
}
throw Error.NoMatch();
}
Which in the case of an array, is as fast as direct linear access.
In a nutshell, this means that calling First(predicate)
on an array will be somewhat faster, by a not large, but still detectable factor. This might not hold for lists, and will certainly not hold for IQueryable
objects, that are a completely different story.
However, this is micro-optimization at it's worst. Unless this is done millions of times, it won't save too many seconds. Even as I know this now, I'll still use whatever is clearer to read and understand.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With