Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate vs Sum Performance in LINQ

Three different implementations of finding the sum of an IEnumerable < int> source are given below along with the time taken when the source has 10,000 integers.

source.Aggregate(0, (result, element) => result + element);   

takes 3 ms

source.Sum(c => c); 

takes 12 ms

source.Sum(); 

takes 1 ms

I am wondering why the second implementation is four times more expensive than the first one. Shouldn't it be same as the third implementation.

like image 219
Gopal Avatar asked Jun 14 '12 09:06

Gopal


People also ask

What is the difference between sum and aggregate?

Aggregate is much more flexible. It can be used on a wide number of types (to build strings for example). Sum has a very specific purpose (to add numbers).

Is Linq good for performance?

LINQ syntax is typically less efficient than a foreach loop. It's good to be aware of any performance tradeoff that might occur when you use LINQ to improve the readability of your code. And if you'd like to measure the performance difference, you can use a tool like BenchmarkDotNet to do so.


1 Answers

Note: My computer is running .Net 4.5 RC, so it's possible that my results are affected by this.

Measuring the time it takes to execute a method just once is usually not very useful. It can be easily dominated by things like JIT compilation, which are not actual bottlenecks in real code. Because of this, I measured executing each method 100× (in Release mode without debugger attached). My results are:

  • Aggregate(): 9 ms
  • Sum(lambda): 12 ms
  • Sum(): 6 ms

The fact that Sum() is the fastest is not surprising: it contains a simple loop without any delegate invocations, which is really fast. The difference between Sum(lambda) and Aggregate() is not nearly as prominent as what you measured, but it's still there. What could be the reason for it? Let's look at decompiled code for the two methods:

public static TAccumulate Aggregate<TSource, TAccumulate>(this IEnumerable<TSource> source, TAccumulate seed, Func<TAccumulate, TSource, TAccumulate> func) {     if (source == null)         throw Error.ArgumentNull("source");     if (func == null)         throw Error.ArgumentNull("func");      TAccumulate local = seed;     foreach (TSource local2 in source)         local = func(local, local2);     return local; }  public static int Sum<TSource>(this IEnumerable<TSource> source, Func<TSource, int> selector) {     return source.Select<TSource, int>(selector).Sum(); } 

As you can see, Aggregate() uses a loop but Sum(lambda) uses Select(), which in turn uses an iterator. And using an iterator means there is some overhead: creating the iterator object and (probably more importantly) one more method invocation for each item.

Let's verify that using Select() is actually the reason by writing our own Sum(lambda) twice, once using Select(), which should behave the same as Sum(lambda) from the framework, and once without using Select():

public static int SlowSum<T>(this IEnumerable<T> source, Func<T, int> selector) {     return source.Select(selector).Sum(); }  public static int FastSum<T>(this IEnumerable<T> source, Func<T, int> selector) {     if (source == null)         throw new ArgumentNullException("source");     if (selector == null)         throw new ArgumentNullException("selector");      int num = 0;     foreach (T item in source)         num += selector(item);     return num; } 

My measurements confirm what I thought:

  • SlowSum(lambda): 12 ms
  • FastSum(lambda): 9 ms
like image 59
svick Avatar answered Oct 14 '22 18:10

svick