Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is a local array faster than a static one to read/write?

I was writing a few benchmarking tests to figure out why a similar pure algorithm (no C++ lib / .net built in classes) ran much faster in C++ than in C#, even when accounting for the expected feature différences. And while doing so i stumbled on these 2 tests that baffled me, does anyone have an idea about why one is substantially slower thant he other? The only difference in the 2nd one (that takes 51ms vs 88 on my machine) is that the 2 arrays are declared locally in the method instead of outside. In both cases the arrays are created before we start timing.

    const int Runs = 100;
    const int Width = 5000;
    const int Height = 5000;
    const int Size = Width * Height;


    static int[] Input = Enumerable.Range(0, Size).ToArray();
    static int[] Output = new int[Size * 2];

    static int SimpleTest()
    {
        // Removing those 2 lines and using the static arrays instead give substantially slower performance, nearly half the speed!
        int[] Input = Enumerable.Range(0, Size).ToArray();
        int[] Output = new int[Size * 2];

        Stopwatch sw = new Stopwatch();
        sw.Start();

        for (int run = 0; run < Runs; run++)
        {
            int InputIndex = 0;
            for (int x = 0; x < Width; x++)
            {
                for (int y = 0; y < Height; y++)
                {
                    int pixel = Input[InputIndex];
                    var OutputIndex = InputIndex * 2;
                    Output[OutputIndex] = pixel;
                    Output[OutputIndex + 1] = pixel;
                    InputIndex++;
                }
            }
        }
        sw.Stop();
        return (int)(sw.ElapsedMilliseconds / Runs);
    }
like image 430
Ronan Thibaudau Avatar asked May 13 '15 04:05

Ronan Thibaudau


People also ask

Which is faster array or list?

An array is faster and that is because ArrayList uses a fixed amount of array. However when you add an element to the ArrayList and it overflows. It creates a new Array and copies every element from the old one to the new one. List over arrays.

What is the difference between static and dynamic array?

Difference Between Static Array and Dynamic ArrayStatic arrays are allocated memory at compile time. Dynamic array is located at run-time. The size of static array is fixed. The size of dynamic array is fixed.

Are lists more efficient than arrays?

Arrays can store data very compactly and are more efficient for storing large amounts of data. Arrays are great for numerical operations; lists cannot directly handle math operations. For example, you can divide each element of an array by the same number with just one line of code.


2 Answers

When the variables are local, the compiler knows that Input and Output will never change, which opens up many optimizations.

  • The value of the Input and Output variables can be kept in registers.
  • Input.Length and Output.Length can be calculated once and cached.
  • The compiler can prove that Input[InputIndex] and Output[OutputIndex] will never result in an array index out of bounds, so the bounds check can be optimized out.
  • The compiler can observe that the results of Input and Output are never used, so it could optimize the loops to nothing!

If you use the static versions, then the compiler cannot perform these optimizations. The compiler must reload Input and Output at each access, and must perform a bounds check at every array index operation, just in case another thread modified Input or Output.

For example, if another thread does Input = new int[Size], then all future calculations must proceed with this alternate Input. And if another thread does Output = new int[1], then the code must raise an IndexOutOfRangeException.

like image 71
Raymond Chen Avatar answered Oct 13 '22 04:10

Raymond Chen


With the 32-bit JIT I believe the culprit is, as Raymond Chen mentionned, that Input and Output can be kept in registers when they are locals, but they need to be reloaded every time when they are not. Generated assembly:

For locals:

007426F0  mov         eax,dword ptr [ebp-18h]  
007426F3  mov         edi,dword ptr [eax+4]  
                        int pixel = Input[InputIndex];
007426F6  mov         eax,dword ptr [ebp-18h]  
007426F9  cmp         edx,edi  
007426FB  jae         0074276E  
007426FD  mov         ecx,dword ptr [eax+edx*4+8] 

For statics:

011C2718  mov         dword ptr [ebp-18h],edx  
011C271B  mov         esi,dword ptr ds:[3BB7E90h]  
011C2721  mov         eax,dword ptr [esi+4]  
011C2724  mov         dword ptr [ebp-1Ch],eax  
                        int pixel = Input[InputIndex];
011C2727  mov         eax,dword ptr [ebp-1Ch]  
011C272A  cmp         ecx,eax  
011C272C  jae         011C27A2  
011C272E  mov         edi,dword ptr [esi+ecx*4+8] 

As you can see, mov esi,dword ptr ds:[3BB7E90h] accesses the data segment. As you can also see, bounds checking occur in both cases (cmp-jae) so that's irrelevant, and the loops are not in fact optimized to nothing.

How the 64-bit JIT avoids the issue is beyond me.

Here's the complete dissassembly for both cases:

Fast version:

               for (int x = 0; x < Width; x++) {
007426EB  mov         dword ptr [ebp-14h],edx  
                    for (int y = 0; y < Height; y++) {
007426EE  xor         ebx,ebx  
007426F0  mov         eax,dword ptr [ebp-18h]  
007426F3  mov         edi,dword ptr [eax+4]  
                        int pixel = Input[InputIndex];
007426F6  mov         eax,dword ptr [ebp-18h]  
007426F9  cmp         edx,edi  
007426FB  jae         0074276E  
007426FD  mov         ecx,dword ptr [eax+edx*4+8]  
                        var OutputIndex = InputIndex * 2;
00742701  mov         esi,edx  
00742703  add         esi,esi  
                        Output[OutputIndex] = pixel;
00742705  mov         eax,dword ptr [ebp-1Ch]  
00742708  cmp         esi,dword ptr [eax+4]  
0074270B  jae         0074276E  
0074270D  mov         dword ptr [eax+esi*4+8],ecx  
                        Output[OutputIndex + 1] = pixel;
00742711  inc         esi  
00742712  mov         eax,dword ptr [ebp-1Ch]  
00742715  cmp         esi,dword ptr [eax+4]  
00742718  jae         0074276E  
0074271A  mov         dword ptr [eax+esi*4+8],ecx  
                        InputIndex++;
0074271E  inc         edx  
                    for (int y = 0; y < Height; y++) {
0074271F  inc         ebx  
                    for (int y = 0; y < Height; y++) {
00742720  cmp         ebx,1388h  
00742726  jl          007426F6  
                for (int x = 0; x < Width; x++) {
00742728  inc         dword ptr [ebp-14h]  
0074272B  cmp         dword ptr [ebp-14h],1388h  
00742732  jl          007426EE 

Slow version:

                for (int x = 0; x < Width; x++) {
011C2713  mov         dword ptr [ebp-14h],ecx  
                    for (int y = 0; y < Height; y++) {
011C2716  xor         edx,edx  
011C2718  mov         dword ptr [ebp-18h],edx  
011C271B  mov         esi,dword ptr ds:[3BB7E90h]  
011C2721  mov         eax,dword ptr [esi+4]  
011C2724  mov         dword ptr [ebp-1Ch],eax  
                        int pixel = Input[InputIndex];
011C2727  mov         eax,dword ptr [ebp-1Ch]  
011C272A  cmp         ecx,eax  
011C272C  jae         011C27A2  
011C272E  mov         edi,dword ptr [esi+ecx*4+8]  
                        var OutputIndex = InputIndex * 2;
011C2732  mov         ebx,ecx  
011C2734  add         ebx,ebx  
                        Output[OutputIndex] = pixel;
011C2736  mov         edx,dword ptr ds:[3BB7E94h]  
011C273C  cmp         ebx,dword ptr [edx+4]  
011C273F  jae         011C27A2  
011C2741  mov         dword ptr [edx+ebx*4+8],edi  
                        Output[OutputIndex + 1] = pixel;
011C2745  inc         ebx  
011C2746  cmp         ebx,dword ptr [edx+4]  
011C2749  jae         011C27A2  
011C274B  mov         dword ptr [edx+ebx*4+8],edi  
                        InputIndex++;
011C274F  inc         ecx  
                    for (int y = 0; y < Height; y++) {
011C2750  inc         dword ptr [ebp-18h]  
011C2753  cmp         dword ptr [ebp-18h],1388h  
011C275A  jl          011C2727  
                for (int x = 0; x < Width; x++) {
011C275C  inc         dword ptr [ebp-14h]  
011C275F  cmp         dword ptr [ebp-14h],1388h  
011C2766  jl          011C2716  
like image 24
Asik Avatar answered Oct 13 '22 05:10

Asik