Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What in Linq.Max implementation is the bottleneck?

Preamble: I was changing a bit of code (manual Max search in an array) to some Linq.Max() super sexy written line, which brings me to questions about performance (I often deal with big arrays). So I made a little program to test, because I only trust what I see and got this results:

The size is now of 1 elements
With the loop it took:  00:00:00.0000015 
With Linq it took:      00:00:00.0000288 
The loop is faster: 94,79%
-----------------------------------------
The size is now of 10 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000007 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 100 elements
With the loop it took:  00:00:00 
With Linq it took:      00:00:00.0000011 
The loop is faster: 100,00%
-----------------------------------------
The size is now of 1 000 elements
With the loop it took:  00:00:00.0000003 
With Linq it took:      00:00:00.0000078 
The loop is faster: 96,15%
-----------------------------------------
The size is now of 10 000 elements
With the loop it took:  00:00:00.0000063 
With Linq it took:      00:00:00.0000765 
The loop is faster: 91,76%
-----------------------------------------
The size is now of 100 000 elements
With the loop it took:  00:00:00.0000714 
With Linq it took:      00:00:00.0007602 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 1 000 000 elements
With the loop it took:  00:00:00.0007669 
With Linq it took:      00:00:00.0081737 
The loop is faster: 90,62%
-----------------------------------------
The size is now of 10 000 000 elements
With the loop it took:  00:00:00.0070811 
With Linq it took:      00:00:00.0754348 
The loop is faster: 90,61%
-----------------------------------------
The size is now of 100 000 000 elements
With the loop it took:  00:00:00.0788133 
With Linq it took:      00:00:00.7758791 
The loop is faster: 89,84%

In short, Linq is almost 10 times slower, which bothered me a bit, so I took a look to the implementation of Max():

public static int Max(this IEnumerable<int> source) {
    if (source == null) throw Error.ArgumentNull("source");
    int value = 0;
    bool hasValue = false;
    foreach (int x in source) {
        if (hasValue) {
            if (x > value) value = x;
        }
        else {
            value = x;
            hasValue = true;
        }
    }
    if (hasValue) return value;
    throw Error.NoElements();
}

As the title already ask, what in this implementation make it 10 times slower? (And it's not about the ForEach, I've checked)

Edit:

Of course, I test in Release mode.

Here's my test code (without output):

//----------------
private int[] arrDoubles;
//----------------

Stopwatch watch = new Stopwatch();
//Stop a 100 Millions to avoid memory overflow on my laptop
for (int i = 1; i <= 100000000; i = i * 10)
{
    fillArray(i);
    watch.Restart();
    int max = Int32.MinValue; // Reset
    for (int j = 0; j < arrDoubles.Length; j++)
    {
        max = Math.Max(arrDoubles[j], max);
    }
    watch.Stop();

    TimeSpan loopSpan = watch.Elapsed;
    
    watch.Restart();
    max = Int32.MinValue; // Reset
    max = arrDoubles.Max();
    watch.Stop();

    TimeSpan linqSpan = watch.Elapsed;
}

//-------------------------------------------

private void fillArray(int nbValues)
{
    int Min = Int32.MinValue;
    int Max = Int32.MaxValue;
    Random randNum = new Random();
    arrDoubles = Enumerable.Repeat(0, nbValues).Select(i => randNum.Next(Min, Max)).ToArray();
}
like image 450
Thomas Ayoub Avatar asked Jul 21 '15 13:07

Thomas Ayoub


1 Answers

This is likely to be happening because accessing an array via IEnumerable<> is much slower than accessing from it's actual array type (even when using foreach).

The following code demonstrates this. Note how the code in max1() and max2() is identical; the only difference is the type of the array parameter. Both methods are passed the same object during testing.

Try running it from a RELEASE build (and not under the debugger, which will enable debug code even for a release build):

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace Demo
{
    public class Program
    {
        private static void Main(string[] args)
        {
            var array = new int[100000000];

            var sw = new Stopwatch();

            for (int trial = 0; trial < 8; ++trial)
            {
                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max1(array);
                var elapsed1 = sw.Elapsed;
                Console.WriteLine("int[] took " + elapsed1);

                sw.Restart();
                for (int i = 0; i < 10; ++i)
                    max2(array);
                var elapsed2 = sw.Elapsed;
                Console.WriteLine("IEnumerable<int> took " + elapsed2);

                Console.WriteLine("\nFirst method was {0} times faster.\n", elapsed2.TotalSeconds / elapsed1.TotalSeconds);
            }
        }

        private static int max1(int[] array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }

        private static int max2(IEnumerable<int> array)
        {
            int result = int.MinValue;

            foreach (int n in array)
                if (n > result)
                    result = n;

            return result;
        }
    }
}

On my PC, the int[] version is around 10 times faster than the IEnumerable<int> version.

like image 53
Matthew Watson Avatar answered Oct 19 '22 07:10

Matthew Watson