Running an empty for-loop with large numbers of iterations, I'm getting wildly different numbers in how long it takes to run:
public static class Program
{
static void Main()
{
var sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 1000000000; ++i)
{
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
The above will run in around 200ms on my machine, but if I increase it to 1000000001, then it takes 4x as long! Then if I make it 1000000002, then it's down to 200ms again!
This seems to happen for an even number of iterations. If I go for (var i = 1; i < 1000000001
, (note starting at 1 instead of 0) then it's 200ms. Or if I do i <= 1000000001
(note less than or equal) then it's 200ms. Or (var i = 0; i < 2000000000; i += 2)
as well.
This appears only to be on x64, but on all .NET versions up to (at least) 4.0. Also it appears only when in release mode with debugger detached.
UPDATE I was thinking that this was likely due to some clever bit shifting in the jit, but the following seems to disprove that: if you do something like create an object inside that loop, then that takes about 4x as long too:
public static class Program
{
static void Main()
{
var sw = new Stopwatch();
sw.Start();
object o = null;
for (var i = 0; i < 1000000000; i++)
{
o = new object();
}
sw.Stop();
Console.WriteLine(o); // use o so the compiler won't optimize it out
Console.WriteLine(sw.ElapsedMilliseconds);
}
}
This takes around 1 second on my machine, but then increasing by 1 to 1000000001 it takes 4 seconds. That's an extra 3000ms, so it couldn't really be due to bit shifting, as that would have shown up as a 3000ms difference in the original problem too.
Well here are the disassemblies:
00000031 xor eax,eax
for (var i = 0; i < 1000000001; ++i)
00000033 inc eax
00000035 cmp eax,3B9ACA01h
0000003a jl 0000000000000033
0000003c movzx eax,byte ptr [rbx+18h]
00000040 test eax,eax
00000042 je 0000000000000073
And
00000031 xor eax,eax
for (var i = 0; i < 1000000000; ++i)
00000033 add eax,4
00000036 cmp eax,3B9ACA00h
0000003b jl 0000000000000033
0000003d movzx eax,byte ptr [rbx+18h]
00000041 test eax,eax
00000043 je 0000000000000074
The only difference I see is that in the even loop, the loop index is incremented by 4 at a time (add eax 4
) instead of 1 at a time (inc eax
) so it finishes the loop 4x faster because of that.
This is just speculation but I believe it is unrolling the loop by a factor of 4. So it places the body 4 times inside the loop and just increments 4 times faster. But because the body is empty, empty body times 4 is still empty, you gain much bigger gain than you would expect from loop unrolling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With