Logo Questions Linux Laravel Mysql Ubuntu Git Menu

for-loop performance oddity in .NET x64: even-number-iteration affinity?

Running an empty for-loop with large numbers of iterations, I'm getting wildly different numbers in how long it takes to run:

public static class Program
    static void Main()
        var sw = new Stopwatch();
        for (var i = 0; i < 1000000000; ++i)

The above will run in around 200ms on my machine, but if I increase it to 1000000001, then it takes 4x as long! Then if I make it 1000000002, then it's down to 200ms again!

This seems to happen for an even number of iterations. If I go for (var i = 1; i < 1000000001, (note starting at 1 instead of 0) then it's 200ms. Or if I do i <= 1000000001 (note less than or equal) then it's 200ms. Or (var i = 0; i < 2000000000; i += 2) as well.

This appears only to be on x64, but on all .NET versions up to (at least) 4.0. Also it appears only when in release mode with debugger detached.

UPDATE I was thinking that this was likely due to some clever bit shifting in the jit, but the following seems to disprove that: if you do something like create an object inside that loop, then that takes about 4x as long too:

public static class Program
    static void Main()
        var sw = new Stopwatch();
        object o = null;
        for (var i = 0; i < 1000000000; i++)
            o = new object();
        Console.WriteLine(o); // use o so the compiler won't optimize it out

This takes around 1 second on my machine, but then increasing by 1 to 1000000001 it takes 4 seconds. That's an extra 3000ms, so it couldn't really be due to bit shifting, as that would have shown up as a 3000ms difference in the original problem too.

like image 459
lobsterism Avatar asked Aug 10 '13 21:08


1 Answers

Well here are the disassemblies:

00000031  xor         eax,eax 
  for (var i = 0; i < 1000000001; ++i)
00000033  inc         eax           
00000035  cmp         eax,3B9ACA01h 
0000003a  jl          0000000000000033 
0000003c  movzx       eax,byte ptr [rbx+18h] 
00000040  test        eax,eax 
00000042  je          0000000000000073 


00000031  xor         eax,eax 
     for (var i = 0; i < 1000000000; ++i)
00000033  add         eax,4 
00000036  cmp         eax,3B9ACA00h 
0000003b  jl          0000000000000033 
0000003d  movzx       eax,byte ptr [rbx+18h] 
00000041  test        eax,eax 
00000043  je          0000000000000074 

The only difference I see is that in the even loop, the loop index is incremented by 4 at a time (add eax 4) instead of 1 at a time (inc eax) so it finishes the loop 4x faster because of that.

This is just speculation but I believe it is unrolling the loop by a factor of 4. So it places the body 4 times inside the loop and just increments 4 times faster. But because the body is empty, empty body times 4 is still empty, you gain much bigger gain than you would expect from loop unrolling.

like image 134
Esailija Avatar answered Sep 26 '22 01:09
