Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reuse of for loop iteration variable

I have seen a lot of questions about whether to declare variables inside or outside a for loop scope. This is discussed at length, for example here, here, and here. The answer is that there is absolutely no performance difference (same IL), but for clarity, declaring variables in the tightest scope is preferred.

I was curious about a slightly different situation:

int i;

for (i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

for (i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

versus

for (int i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

for (int i = 0; i < 10; i++) {
    Console.WriteLine(i);
}

I expected both methods to compile to the same IL in Release mode. However, this is not the case. I'll spare you the full IL and just point out the difference. The first method has one local:

.locals init (
    [0] int32 i
)

while the second has just two locals, one for each for loop counter:

.locals init (
    [0] int32 i,
    [1] int32 i
)

So there is a difference between these two which is not optimized away, which is surprising to me.

Why am I seeing this, and is there actually a performance difference between the two methods?

like image 432
msitt Avatar asked Apr 25 '17 01:04

msitt


3 Answers

To answer your question, you've actually declared one local variable in the first case, and two in the second. The C# compiler apparently does not reuse the local variables even though I think it would be permitted to do so. My guess is that this is just not a performance gain that is worth writing a complex analysis to handle and might not even be useful if the JIT is smart enough to handle it anyway. However, the optimization you are expecting to see is done, just not at the IL level. It is done by the JIT compiler in the emitted machine code.

This is a simple enough case where inspecting the emitted machine code is actually informative. The summary is that these two methods will JIT compile to the same machine code (x86 shown below, but x64 machine code is the same as well) and thus there is no performance gain from using fewer local variables.

A quick note on conditions, I took both of these fragments and put them into different methods. Then I looked at the disassembly in Visual Studio 2015, with a .NET 4.6.1 runtime, x86 Release build (i.e. optimizations on) and attaching the debugger after the JIT has compiled the methods (at least on invocation without the debugger attached). I disabled method inlining to keep things consistent between both methods. To view the disassembly, place a break point in the desired method, attach, go to Debug > Windows > Disassembly. Hit F5 to run to the break point.

Without further ado, the first method disassembles to

            for (i = 0; i < 10; i++)
010204A2  in          al,dx  
010204A3  push        esi  
010204A4  xor         esi,esi  
            {
                Console.WriteLine(i);
010204A6  mov         ecx,esi  
010204A8  call        71686C0C  
            for (i = 0; i < 10; i++)
010204AD  inc         esi  
010204AE  cmp         esi,0Ah  
010204B1  jl          010204A6  
            }

            for (i = 0; i < 10; i++)
010204B3  xor         esi,esi  
            {
                Console.WriteLine(i);
010204B5  mov         ecx,esi  
010204B7  call        71686C0C  
            for (i = 0; i < 10; i++)
010204BC  inc         esi  
010204BD  cmp         esi,0Ah  
010204C0  jl          010204B5  
010204C2  pop         esi  
010204C3  pop         ebp  
010204C4  ret  

The second method disassembles to

            for (int i = 0; i < 10; i++)
010204DA  in          al,dx  
010204DB  push        esi  
010204DC  xor         esi,esi  
            {
                Console.WriteLine(i);
010204DE  mov         ecx,esi  
010204E0  call        71686C0C  
            for (int i = 0; i < 10; i++)
010204E5  inc         esi  
010204E6  cmp         esi,0Ah  
010204E9  jl          010204DE  
            }

            for (int i = 0; i < 10; i++)
010204EB  xor         esi,esi  
            {
                Console.WriteLine(i);
010204ED  mov         ecx,esi  
010204EF  call        71686C0C  
            for (int i = 0; i < 10; i++)
010204F4  inc         esi  
010204F5  cmp         esi,0Ah  
010204F8  jl          010204ED  
010204FA  pop         esi  
010204FB  pop         ebp  
010204FC  ret  

As you can see, aside from different offsets for the appropriate jumps, the code is identical.

These methods are quite simple so the work of keeping track of the loop counter is done with the esi register.

It is left as an exercise for the reader to verify in x64.

like image 125
Mike Zboray Avatar answered Oct 08 '22 15:10

Mike Zboray


As an addition to the existing answer, note that collapsing the two variables into one could actually hurt performance, depending on what information the JIT compiler is able to infer.

If the JIT compiler sees two variables with non-overlapping lifetimes, it is free to use the same location (usually a register) for both. But if the JIT compiler sees a single variable, it is required to use the same location. Or, more accurately, it is required to maintain the value of the variable for its whole lifetime.

In your specific case, that would mean that after the first loop ends and before the second loop starts, the compiler can't throw away the value of the variable and reuse the location for some other purpose.

But even with a single IL variable, it's not given that the JIT compiler actually sees it as a single variable. A smart compiler can see that when the code leaves the first loop, the variable is not going to be read again, before it's overwritten. So it can treat the single IL variable as two, and throw away the value between the loops.

To sum up:

  1. For a dumb compiler, that does not analyze variable lifetimes, one variable is better than two.
  2. For a decent compiler, that can analyze variable lifetimes but can not split variables, two variables are better than one.
  3. For a smart compiler, that can analyze variable lifetimes and can also split variables, it doesn't matter.

The JIT compiler is either #2 or #3, so it makes sense to use two variables in IL.

like image 40
svick Avatar answered Oct 08 '22 15:10

svick


Just to add few things to the detailed answer above. The C# compiler makes very few optimizations, like concatenating string literals ("a" + "b") and calculating constants. Therefore, it's quite pointless to look at the IL generated by the C# compiler for optimizations. Instead you should look at the assembler generated by the JIT compiler.

Also the build parameters can suppress JIT optimizations. So make sure that you set up a Release build mode and cleared "Suppress JIT optimization on module load" flag in VS debug options

like image 30
SENya Avatar answered Oct 08 '22 16:10

SENya