Why is a local array faster than a static one to read/write?

Tags:

I was writing a few benchmarking tests to figure out why a similar pure algorithm (no C++ lib / .net built in classes) ran much faster in C++ than in C#, even when accounting for the expected feature différences. And while doing so i stumbled on these 2 tests that baffled me, does anyone have an idea about why one is substantially slower thant he other? The only difference in the 2nd one (that takes 51ms vs 88 on my machine) is that the 2 arrays are declared locally in the method instead of outside. In both cases the arrays are created before we start timing.

    const int Runs = 100;
    const int Width = 5000;
    const int Height = 5000;
    const int Size = Width * Height;


    static int[] Input = Enumerable.Range(0, Size).ToArray();
    static int[] Output = new int[Size * 2];

    static int SimpleTest()
    {
        // Removing those 2 lines and using the static arrays instead give substantially slower performance, nearly half the speed!
        int[] Input = Enumerable.Range(0, Size).ToArray();
        int[] Output = new int[Size * 2];

        Stopwatch sw = new Stopwatch();
        sw.Start();

        for (int run = 0; run < Runs; run++)
        {
            int InputIndex = 0;
            for (int x = 0; x < Width; x++)
            {
                for (int y = 0; y < Height; y++)
                {
                    int pixel = Input[InputIndex];
                    var OutputIndex = InputIndex * 2;
                    Output[OutputIndex] = pixel;
                    Output[OutputIndex + 1] = pixel;
                    InputIndex++;
                }
            }
        }
        sw.Stop();
        return (int)(sw.ElapsedMilliseconds / Runs);
    }

430

asked May 13 '15 04:05

Ronan Thibaudau

2 Answers

When the variables are local, the compiler knows that Input and Output will never change, which opens up many optimizations.

The value of the Input and Output variables can be kept in registers.
Input.Length and Output.Length can be calculated once and cached.
The compiler can prove that Input[InputIndex] and Output[OutputIndex] will never result in an array index out of bounds, so the bounds check can be optimized out.
The compiler can observe that the results of Input and Output are never used, so it could optimize the loops to nothing!

If you use the static versions, then the compiler cannot perform these optimizations. The compiler must reload Input and Output at each access, and must perform a bounds check at every array index operation, just in case another thread modified Input or Output.

For example, if another thread does Input = new int[Size], then all future calculations must proceed with this alternate Input. And if another thread does Output = new int[1], then the code must raise an IndexOutOfRangeException.

answered Oct 13 '22 04:10

Raymond Chen

With the 32-bit JIT I believe the culprit is, as Raymond Chen mentionned, that Input and Output can be kept in registers when they are locals, but they need to be reloaded every time when they are not. Generated assembly:

For locals:

007426F0  mov         eax,dword ptr [ebp-18h]  
007426F3  mov         edi,dword ptr [eax+4]  
                        int pixel = Input[InputIndex];
007426F6  mov         eax,dword ptr [ebp-18h]  
007426F9  cmp         edx,edi  
007426FB  jae         0074276E  
007426FD  mov         ecx,dword ptr [eax+edx*4+8]

For statics:

011C2718  mov         dword ptr [ebp-18h],edx  
011C271B  mov         esi,dword ptr ds:[3BB7E90h]  
011C2721  mov         eax,dword ptr [esi+4]  
011C2724  mov         dword ptr [ebp-1Ch],eax  
                        int pixel = Input[InputIndex];
011C2727  mov         eax,dword ptr [ebp-1Ch]  
011C272A  cmp         ecx,eax  
011C272C  jae         011C27A2  
011C272E  mov         edi,dword ptr [esi+ecx*4+8]

As you can see, mov esi,dword ptr ds:[3BB7E90h] accesses the data segment. As you can also see, bounds checking occur in both cases (cmp-jae) so that's irrelevant, and the loops are not in fact optimized to nothing.

How the 64-bit JIT avoids the issue is beyond me.

Here's the complete dissassembly for both cases:

Fast version:

               for (int x = 0; x < Width; x++) {
007426EB  mov         dword ptr [ebp-14h],edx  
                    for (int y = 0; y < Height; y++) {
007426EE  xor         ebx,ebx  
007426F0  mov         eax,dword ptr [ebp-18h]  
007426F3  mov         edi,dword ptr [eax+4]  
                        int pixel = Input[InputIndex];
007426F6  mov         eax,dword ptr [ebp-18h]  
007426F9  cmp         edx,edi  
007426FB  jae         0074276E  
007426FD  mov         ecx,dword ptr [eax+edx*4+8]  
                        var OutputIndex = InputIndex * 2;
00742701  mov         esi,edx  
00742703  add         esi,esi  
                        Output[OutputIndex] = pixel;
00742705  mov         eax,dword ptr [ebp-1Ch]  
00742708  cmp         esi,dword ptr [eax+4]  
0074270B  jae         0074276E  
0074270D  mov         dword ptr [eax+esi*4+8],ecx  
                        Output[OutputIndex + 1] = pixel;
00742711  inc         esi  
00742712  mov         eax,dword ptr [ebp-1Ch]  
00742715  cmp         esi,dword ptr [eax+4]  
00742718  jae         0074276E  
0074271A  mov         dword ptr [eax+esi*4+8],ecx  
                        InputIndex++;
0074271E  inc         edx  
                    for (int y = 0; y < Height; y++) {
0074271F  inc         ebx  
                    for (int y = 0; y < Height; y++) {
00742720  cmp         ebx,1388h  
00742726  jl          007426F6  
                for (int x = 0; x < Width; x++) {
00742728  inc         dword ptr [ebp-14h]  
0074272B  cmp         dword ptr [ebp-14h],1388h  
00742732  jl          007426EE

Slow version:

                for (int x = 0; x < Width; x++) {
011C2713  mov         dword ptr [ebp-14h],ecx  
                    for (int y = 0; y < Height; y++) {
011C2716  xor         edx,edx  
011C2718  mov         dword ptr [ebp-18h],edx  
011C271B  mov         esi,dword ptr ds:[3BB7E90h]  
011C2721  mov         eax,dword ptr [esi+4]  
011C2724  mov         dword ptr [ebp-1Ch],eax  
                        int pixel = Input[InputIndex];
011C2727  mov         eax,dword ptr [ebp-1Ch]  
011C272A  cmp         ecx,eax  
011C272C  jae         011C27A2  
011C272E  mov         edi,dword ptr [esi+ecx*4+8]  
                        var OutputIndex = InputIndex * 2;
011C2732  mov         ebx,ecx  
011C2734  add         ebx,ebx  
                        Output[OutputIndex] = pixel;
011C2736  mov         edx,dword ptr ds:[3BB7E94h]  
011C273C  cmp         ebx,dword ptr [edx+4]  
011C273F  jae         011C27A2  
011C2741  mov         dword ptr [edx+ebx*4+8],edi  
                        Output[OutputIndex + 1] = pixel;
011C2745  inc         ebx  
011C2746  cmp         ebx,dword ptr [edx+4]  
011C2749  jae         011C27A2  
011C274B  mov         dword ptr [edx+ebx*4+8],edi  
                        InputIndex++;
011C274F  inc         ecx  
                    for (int y = 0; y < Height; y++) {
011C2750  inc         dword ptr [ebp-18h]  
011C2753  cmp         dword ptr [ebp-18h],1388h  
011C275A  jl          011C2727  
                for (int x = 0; x < Width; x++) {
011C275C  inc         dword ptr [ebp-14h]  
011C275F  cmp         dword ptr [ebp-14h],1388h  
011C2766  jl          011C2716

answered Oct 13 '22 05:10

Asik

Related questions
                            
                                How to Bind a Command in WPF
                            
                                Errors parsing JSON using Newtonsoft.Json
                            
                                MonoDevelop 3.1.1 can't find System.Xml.Linq
                            
                                is it possible to disable implicit ToString() call?
                            
                                Is there Point3D?
                            
                                Insert Dictionary into MongoDB with c# driver
                            
                                Stored procedure slower when called from ASP.NET vs. SQL Mgmt Admin
                            
                                NUnit or Fluent Assertions test for reference equality?
                            
                                How do I make a header for a ListBoxItem?
                            
                                ExecuteScalar call throwing exception "Object reference not set to an instance of an object"
                            
                                How to use GroupBy using Dynamic LINQ
                            
                                Using FileReader.readAsDataUrl to upload image to Web Api service
                            
                                Does C# AsyncCallback creates a new thread?
                            
                                Displaying Arabic characters in C# console application
                            
                                multi-threading based RabbitMQ consumer
                            
                                Asp.Net Identity Localization PublicKeyToken
                            
                                Extension Method to Get the Values of Any Enum
                            
                                Why can't a static and non-static method share the same signature?
                            
                                Error while loading project: Attribute Include is unrecognized
                            
                                Edit a List<Tuple> item

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is a local array faster than a static one to read/write?

Tags:

performance

arrays

c#

Ronan Thibaudau

People also ask

2 Answers

Raymond Chen

Asik

Recent Activity

Donate For Us