Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is this function faster and why are multiple enumerations of it faster than the first?

I needed a TakeLast<T>(int n) -style LINQ function. I came across this StackOverflow post: https://stackoverflow.com/a/3453282/825011. I liked this answer simply because it was a simple implementation. Then, another colleague of mine pointed out that the Reverse() must be more-costly than Skip(length - n). Which caused me to write a test.

Here are the competing functions.

public static IEnumerable<T> TakeLast<T>( this IEnumerable<T> c, int n ) {
    return c.Reverse().Take( n ).Reverse();
}


public static IEnumerable<T> TakeLast2<T>( this IEnumerable<T> c, int n ) {
    var count = c.Count();
    return c.Skip( count - n );
}

I timed the execution of obtaining the last 10 elements of the enumeration Enumerable.Range( 0, 100000 ). I found that:

  1. TakeLast() is faster ~5x.
  2. Enumerations of TakeLast() are significantly faster after the first enumeration.

Here's the .NET Fiddle for my code (which was originally ran locally, but is also demonstrated here.): http://dotnetfiddle.net/ru7PZE

Questions

  1. Why is TakeLast() faster?
  2. Why are the second and third enumerations of TakeLast() faster than the first, but all enumerations of TakeLast2() are about the same?
like image 631
Sam Rueby Avatar asked Dec 26 '22 11:12

Sam Rueby


1 Answers

You're not materializing the results of your query until after you print out the stopwatch's elapsed time. LINQ queries use deferred execution to avoid actually executing the query until they are enumerated. In the case of your second method you're calling Count before building the query. Count needs to actually enumerate the entire result set to compute it's value. This means that your second method needs to iterate the sequence each time, whereas the first query is able to successfully defer its work until after you display the time. I would expect it to have more work to do, in many situations, it just succeeds at waiting until after you're done timing it.

As for why the first is faster when called multiple times, that pretty much just comes down to the fact that the JIT warmup that takes place when executing any code. The second method is does get that speedup, but since it doesn't get to omit the iteration of the query each time (which is a large portion of its cost) the percent speedup is much smaller.

Note that your second query iterates the source sequence twice (unless the enumerable happens to implement ICollection). This means that if the object is something that can be efficient iterated multiple times, it's not a problem. If it implements IList it will in fact be much faster, but if it is something like say an IQueryable that needs to execute an expensive query against a database each time its iterated it will need to do that twice, not once. If it's a query that doesn't even have the same contents when iterated multiple times then that can cause all sorts of problems.

like image 183
Servy Avatar answered May 17 '23 13:05

Servy