It may be the case that my hardware is the culprit, but during testing, I've found that:
void SomeFunction(AType ofThing) {
DoSomething(ofThing);
}
...is faster than:
private AType _ofThing;
void SomeFunction() {
DoSomething(_ofThing);
}
I believe it has to do with how the compiler translates this to CIL. Could anyone please explain, specifically, why does this happen?
Here's some code where it happens:
public void TestMethod1()
{
var stopwatch = new Stopwatch();
var r = new int[] { 1, 2, 3, 4, 5 };
var i = 0;
stopwatch.Start();
while (i < 1000000)
{
DoSomething(r);
i++;
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
i = 0;
stopwatch.Restart();
while (i < 1000000)
{
DoSomething();
i++;
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
private void DoSomething(int[] arg1)
{
var r = arg1[0] * arg1[1] * arg1[2] * arg1[3] * arg1[4];
}
private int[] _arg1 = new [] { 1, 2, 3, 4, 5 };
private void DoSomething()
{
var r = _arg1[0] * _arg1[1] * _arg1[2] * _arg1[3] * _arg1[4];
}
In my case it is 2.5x slower to use a private property.
I believe it has to do with how the compiler translates this to CIL.
Not really. Performance doesn't directly depend on the CIL code, because that's not what's actually executed. What's executed is the JITed native code, so you should look at that when you're interested in performance.
So, let's look the the code generated for the DoSomething(int[])
loop:
mov eax,dword ptr [ebx+4] ; get the length of the array
cmp eax,0 ; if it's 0
jbe 0000018C ; jump to code that throws IndexOutOfRangeException
cmp eax,1 ; if it's 1, etc.
jbe 0000018C
cmp eax,2
jbe 0000018C
cmp eax,3
jbe 0000018C
cmp eax,4
jbe 0000018C
inc esi ; i++
cmp esi,0F4240h ; if i < 1000000
jl 000000B7 ; loop again
What's interesting about this code is that there is no useful work done at all, most of the code is array bounds checking (why the code hasn't been optimized to perform this checking only once before the loop, I have no idea).
Also notice that the code is inlined, you're not paying the cost of a function call.
This code takes around 1.7 ms on my computer.
So, how does the loop for DoSomething()
look like?
mov ecx,dword ptr [ebp-10h] ; access this
call dword ptr ds:[001637F4h] ; call DoSomething()
inc esi ; i++
cmp esi,0F4240h ; if i < 1000000
jl 00000120 ; loop again
Okay, so this actually calls the method, no inlining this time. What does the method itself look like?
mov eax,dword ptr [ecx+4] ; access this._arg1
cmp dword ptr [eax+4],0 ; if its length is 0
jbe 00000022 ; jump to code that throws IndexOutOfRangeException
cmp dword ptr [eax+4],1 ; etc.
jbe 00000022
cmp dword ptr [eax+4],2
jbe 00000022
cmp dword ptr [eax+4],3
jbe 00000022
cmp dword ptr [eax+4],4
jbe 00000022
ret ; bounds checks successful, return
Comparing with the previous version (and ignoring the overhead of the function call for now), this does three different memory accesses instead of just one, which could explain some of the performance difference. (I think the five accesses to eax+4
should be counted only as one, because otherwise the compiler would optimize them.)
This code runs in about 3.0 ms for me.
How much overhead does the method call take? We can check that by adding [MethodImpl(MethodImplOptions.NoInlining)]
to the previously inlined DoSomething(int[])
. The assembly now looks like this:
mov ecx,dword ptr [ebp-10h] ; access this
mov edx,dword ptr [ebp-14h] ; access r
call dword ptr ds:[002937E8h] ; call DoSomething(int[])
inc esi ; i++
cmp esi,0F4240h ; if i < 1000000
jl 000000A0 ; loop again
Notice that r
is now no longer kept in a register, it's instead on the stack, which will add another slowdown.
Now DoSomething(int[])
:
push ebp ; save ebp from caller to stack
mov ebp,esp ; write our own ebp
mov eax,dword ptr [edx+4] ; read the length of the array
cmp eax,0 ; if it's 0
jbe 00000021 ; jump to code that throws IndexOutOfRangeException
cmp eax,1 ; etc.
jbe 00000021
cmp eax,2
jbe 00000021
cmp eax,3
jbe 00000021
cmp eax,4
jbe 00000021
pop ebp ; restore ebp
ret ; return
This code runs in about 3.2 ms for me. That's even slower than DoSomething()
. What's going on?
Turns out, [MethodImpl(MethodImplOptions.NoInlining)]
seems to cause those unnecessary ebp
instructions. If I add that attribute to DoSomething()
, it runs in 3.3 ms.
This means the difference between stack access and heap access is pretty small (but still measurable). The fact that the array pointer could be kept in a register when the method was inlined was probably more significant.
So, the conclusion is that the big difference you're seeing is because of inlining. The JIT compiler decided inline the code for DoSomething(int[])
, but not for DoSomething()
, which allowed the code for DoSomething(int[])
to be very efficient. The most likely reason for that is because the IL for DoSomething()
is much longer (21 bytes vs. 46 bytes).
Also, you're not really measuring what you wrote (array accesses and multiplications), because that could be optimized out. So be careful with devising your microbenchmarks, so that the compiler can't ignore the code you actually wanted to measure.
Several people have made a stack/heap distinction, but this is a false dichotomy; when the IL is compiled to machine code there are additional possibilities, such as passing arguments in registers, which is potentially even faster than getting them off of the stack. See Eric Lippert's great blog post The Truth About Value Types for more thoughts along these lines. In any case, a proper analysis of the performance difference will almost certainly require looking at the generated machine code, not at the IL, and will potentially depend on the version of the JIT compiler, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With