Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

where lambda vs. first lambda

Tags:

.net

linq

c#-3.0

Suppose I have some strings:

string[] strings = { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

What is the difference between:

string startsWithO = strings.First(s => s[0] == 'o');

And:

string startsWithO = strings.Where(s => s[0] == 'o').First();

Since Where() is deferred it shouldn't slow down the execution, right?

like image 282
micahhoover Avatar asked Jan 10 '13 16:01

micahhoover


3 Answers

The performance penalty of using .Where(filter).First() rather than .First(filter) will usually be very small.

However, they're not the same - Where generates a new iterator that First can simply take one element of, whereas First(filter) can microoptimize by using just one loop and a direct return whenever filter matches.

So while both approaches have the same semantics and both execute the filter equally often (only as often as necessary), using First with a filter parameter doesn't need to create an intermediate iterator object and probably avoids some very simple method calls on that iterator too.

In other words, if you're executing such code millions of times, you will see a slight performance difference - but nothing huge; I would never worry about it. Whenever that tiny performance difference actually matters you're much better off just writing the (very simple) foreach-with-if statement that's equivalent and avoiding the extra calls and object allocations inherent in LINQ - but remember that this is a microoptimization you'll rarely need.

Edit: Benchmark demonstrating the effect:

This takes 0.78 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).First(n=> n > 2);
GC.Collect();

But this takes 1.41 seconds:

for(int i=0;i<10*1000*1000;i++)
  Enumerable.Range(0,1000).Where(n=> n > 2).First();
GC.Collect();

Whereas plain loops are much faster (0.13 seconds):

long bla = 0;
for(int i=0;i<10*1000*1000;i++)
    for(int n=0;n<1000;n++)
        if(n > 2) { bla+=n; break; }
GC.Collect();
Console.WriteLine(bla);//avoid optimizer cheating.

Note that this benchmark only shows such extreme differences because I have a trivial filter and a very short non-matching prefix.

Based on some quick experimentation, the difference seems largely attributable to the details of which codepath gets taken. So, for array's and List<>s the first variant is actually faster, likely to do special-casing in .Where for those types that First doesn't have; for custom iterators, the second version is a tiny bit faster, as expected.

Summary:

.Where(...).First() is roughly as fast as .First(...) - don't bother choosing one or the other as an optimization. In general .First(...) is very slightly faster but in some common cases it is slower. If you really need that microoptimization then use plain loops which are faster than either.

like image 50
Eamon Nerbonne Avatar answered Oct 24 '22 05:10

Eamon Nerbonne


There are no differences here.

Calling Where first returns an iterator that's not used until First starts looping over it.

If the predicate doesn't match any elements then the same exception InvalidOperationException is thrown.

The only difference is the verbosity of the code, so .First without .Where should be preferred

like image 36
Sten Petrov Avatar answered Oct 24 '22 05:10

Sten Petrov


In the specific case, calling First and Where on a string[], the methods called are the Enumerable.Where and Enumerable.Firstextension methods.

Enumerable.Where does this:

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) 
{
  // null checks omitted
  if (source is TSource[]) 
     return new WhereArrayIterator<TSource>((TSource[])source, predicate); 
  //the rest of the method will not execute
}

and the constructor of WhereArrayIterator does just:

public WhereArrayIterator(TSource[] source, Func<TSource, bool> predicate) {
  this.source = source; 
  this.predicate = predicate;
} 

So nothing is actually done here, except to create an iterator.

The first First method, without a predicate does this:

public static TSource First<TSource>(this IEnumerable<TSource> source) { 
  //null check
  IList<TSource> list = source as IList<TSource>;
  if (list != null) {
     //this branch is not taken as string[] does not implement IList<string>
     if (list.Count > 0) return list[0]; 
  }
  else { 
    //this is actually the WhereArrayIterator from before
    using (IEnumerator<TSource> e = source.GetEnumerator()) { 
      if (e.MoveNext()) 
        return e.Current;
    } 
  }
  throw Error.NoElements();
}

However, the second First does this

public static TSource First<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
   //null checks
   foreach (TSource element in source) {
     if (predicate(element)) return element; 
   }
   throw Error.NoMatch();
}

Which in the case of an array, is as fast as direct linear access.
In a nutshell, this means that calling First(predicate) on an array will be somewhat faster, by a not large, but still detectable factor. This might not hold for lists, and will certainly not hold for IQueryable objects, that are a completely different story.

However, this is micro-optimization at it's worst. Unless this is done millions of times, it won't save too many seconds. Even as I know this now, I'll still use whatever is clearer to read and understand.

like image 1
SWeko Avatar answered Oct 24 '22 07:10

SWeko