Code:
internal class Program
{
private static void Main(string[] args)
{
const int iterCount = 999999999;
var sum1 = 0;
var sum2 = 0;
using (new Dis())
{
var sw = DateTime.Now;
for (var i = 0; i < iterCount; i++)
sum1 += i;
Console.WriteLine(sum1);
Console.WriteLine(DateTime.Now - sw);
}
using (new Dis())
{
var sw = DateTime.Now;
for (var i = 0; i < iterCount; i++)
sum2 += i;
Console.WriteLine(sum2);
Console.WriteLine(DateTime.Now - sw);
}
Console.ReadLine();
}
private class Dis : IDisposable
{
public void Dispose(){}
}
}
Two identical blocks in identical usings.
Output:
2051657985
00:00:00.3690996
2051657985
00:00:02.2640266
Second block takes 2.2 seconds ! But if to get rid of usings, durations became same ( ~0.3s, like first one). I've tried with .net framework 4.5 and .net core 1.1, in release, results are same.
Can anybody explain that behavior ?
You don't have a compiler or you haven't configured CodeBlocks with a compiler.To fix it: Download and Install a C/C++ compiler for your OS and configure CodeBlocks to use it. Or if you are using Windows you can also download this CodeBlocks setup which already includes MinGW compiler.
A code block is executed in an execution frame.
You have to look at the machine code that the jitter generates to see the underlying reason. Use Tools > Options > Debugging > General > untick the Suppress JIT optimization option. Switch to the Release build. Set a breakpoint on the first and second loops. When it hits use Debug > Windows > Disassembly.
You'll see the machine code for the bodies of the for loop:
sum1 += i;
00000035 add esi,eax
And:
sum2 += i;
000000d9 add dword ptr [ebp-24h],eax
Or in other words, the sum1
variable is stored in CPU register esi
. But the sum2
variable is stored in memory, on the stack frame of the method. Big, big difference. Registers are very fast, memory is slow. The memory for the stack frame will be in the L1 cache, on modern machines accessing that cache has a latency of 3 cycles. The store buffer will be quickly overwhelmed with the large number of writes and that causes the processor to stall.
Finding a way to keep variables in CPU registers is one of the primary jitter optimization duties. But that has limitations, x86 in particular has few registers available. When they are all used up then the jitter has no option but using memory instead. Note that the using
statement has an additional hidden local variable under the hood, that is why it had an effect.
Ideally the jitter optimizer would have made a better choice on how to allocate the registers. Using them for the loop variables (which it did) and the sum variables. An ahead-of-time compiler would get that right, having sufficient time to perform the code analysis. But a just-in-time compiler operates under strict time constraints.
Basic counter-measures are:
The last bullet is effective for the legacy x64 jitter (target .NET 3.5 to use it), but not for the x64 jitter rewrite (aka RYuJIT) first made available in 4.6. A rewrite that was necessary because the legacy jitter took too much time optimizing code. Disappointing, RyuJIT does have a knack for disappointing, I think its optimizer could do a better job here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With