Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance difference between C# for-loop and Array.Fill

I have implemented the following benchmark using BenchmarkDotNet:

public class ForVsFillVsEnumerable
{
    private bool[] data;

    [Params(10, 100, 1000)]
    public int N;

    [GlobalSetup]
    public void Setup()
    {
        data = new bool[N];
    }

    [Benchmark]
    public void Fill()
    {
        Array.Fill(data, true);
    }

    [Benchmark]
    public void For()
    {           
        for (int i = 0; i < data.Length; i++)
        {
            data[i] = true;
        }
    }

    [Benchmark]
    public void EnumerableRepeat()
    {
        data = Enumerable.Repeat(true, N).ToArray();
    }
}

The results are:

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.17763.195 (1809/October2018Update/Redstone5)
Intel Core i7-8700K CPU 3.70GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=2.2.200-preview-009648
  [Host] : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
  Core   : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT

Job=Core  Runtime=Core
           Method |    N |       Mean |      Error |      StdDev |     Median | Ratio | Rank |
----------------- |----- |-----------:|-----------:|------------:|-----------:|------:|-----:|
             Fill |   10 |   3.675 ns |  0.2550 ns |   0.7150 ns |   3.331 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For |   10 |   6.615 ns |  0.3928 ns |   1.1581 ns |   6.056 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat |   10 |  25.388 ns |  1.0451 ns |   2.9307 ns |  24.170 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
             Fill |  100 |  50.557 ns |  2.0766 ns |   6.1229 ns |  46.690 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For |  100 |  64.330 ns |  4.0058 ns |  11.8111 ns |  59.442 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat |  100 |  81.784 ns |  4.2407 ns |  12.5039 ns |  75.937 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
             Fill | 1000 | 447.016 ns | 15.4420 ns |  45.5312 ns | 420.239 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
              For | 1000 | 589.243 ns | 51.3450 ns | 151.3917 ns | 495.177 ns |  1.00 |    1 |
                  |      |            |            |             |            |       |      |
 EnumerableRepeat | 1000 | 519.124 ns | 21.3580 ns |  62.9746 ns | 505.573 ns |  1.00 |    1 |

Originally I guessed the Array.Fill does some optimizations which make it perform better than for-loop, but then I checked the .NET Core source code to see that the Array.Fill implementation is pretty straightforward:

public static void Fill<T>(T[] array, T value)
{
    if (array == null)
    {
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);
    }

    for (int i = 0; i < array.Length; i++)
    {
        array[i] = value;
    }
}

The performance is close enough, but it stills seems Fill is consistently a bit faster then for even though under the hood it is exactly the same code. Can you explain why? Or am I just reading the results incorrectly?

like image 401
Martin Zikmund Avatar asked Jan 08 '19 16:01

Martin Zikmund


1 Answers

I'm surprised by Enumerable.Repeat(), contrarily to my first thought it scales pretty well. Anyway, to answer your question: when you use For() you repeatedly access a class member while when calling Array.Fill() you obtain its address just once.

I'm even more surprised that compiler does not detect - and optimize - this but to read the value of a class member you need ldarg.0 to get the value of this and then ldfld ForVsFillVsEnumerable.data to obtain its actual address. In ForVsFillVsEnumerable.Fill() this is done just once to call Array.Fill().

You can check this writing your own fill function:

[Benchmark]
public void For2()
{
    ForImpl(data);
}

private static void ForImpl(bool[] data)
{
    for (int i = 0; i < data.Length; i++)
    {
        data[i] = true;
    }
}

Note 1: regardless performance, to use a library function is always better because it can potentially benefit of future optimizations (they may decide, for example, to add specific overloads for Array.Fill() and to implement them with native code where - for some architectures - a plain memset() is extremely fast).

Note 2: if loop code is so small (and fast) I'd avoid to measure anything with small vectors (10 or 100 items) because it's extremely difficult to setup a proper test environment to reliably measure a difference of few nanoseconds. I'd consider 1000 (or even 100,000) the very minimum to begin with (and even in that case so many other things will play a relevant role...) Unless your real-world use case is 10/100...in that case I'd try to measure a bigger algorithm where this difference is more evident (and if it's not then you shouldn't care).

like image 114
Adriano Repetti Avatar answered Sep 27 '22 22:09

Adriano Repetti