In C, when compiled to a x86 machine, I would usually replace branches with logical expression, when speed is the most important aspect, even if the conditions are complex, for example, instead of
char isSomething() {
if (complexExpression01) {
if (complexExpression02) {
if(!complexExpression03) {
return 1;
}
}
}
return 0;
}
I will write:
char isSomething() {
return complexExpression01 &&
complexExpression02 &&
!complexExpression03 ;
}
Now clearly, this might be a harder to maintain, and less readable code, but it might actually be faster.
Is there any reason to act the same way when working with managed code, such as C#? Are "jumps" expensive in managed code as they are in unmanaged code (at least on x86)?
In your regular compiler, the generated code will most often be the same, at least when assuming that you are using regular
csc.exe /optimize+
cl.exe /O2
g++ -O2
and related default optimization modes.
The general mantra is: profile, profile, profile (and don't micro-optimize until your profiler tells you to). You can always look at the generated code2 to see whether there is room for improvement.
Think of it this way, e.g. the C# code:
Each of your complexExpressions is a de-facto function call invocation (call, calli, callvirt opcode3) that requires it's arguments to be pushed onto the stack. The return value will be left pushed onto the stack instead of the parameters on exit.
Now, CLR being a stack-based virtual machine (i.e. register-less) this amounts to exactly the same as an anonymous temporary variable on the stack. The only difference is the number of identifiers used in the code.
Now what the JIT engine does with that is another matter: the JIT engine will have to translate these calls into native assembly and may be doing optimization by tweaking register-allocation, instruction ordering, branch prediction and stuff like that1
1 (though in practice, for this sample it won't be allowed to do the more interesting optimizations because the complex function calls
may have sideeffects and the C# specs are very clear about the evaluation order and so-called sequence)point. Note however, that the JIT engine is allowed to inline function calls, to reduce call overhead.
Not only when they are non-virtual, but (IIRC) also when the runtime type can be known statically at compile time for certain .NET framework internals. I'd have to look up a reference for this, but in fact I think there are attributes introduced in .NET framework 4.0 to explicit prevent inlining of framework functions; this is so that Microsoft can patch library code in service packs/updates, even when user assemblies have been ahead-of-time compiled (ngen.exe) into native images.
In C/C++ the memory model is much more lax (i.e. at least until C++11) and the code is usually compiled down to native instructions at compile time directly. Add that C/C++ compilers usually do aggressive inlining, the code even in such compilers will usually be the same, unless you compile without optimization enabled
2 I use
ildasm
or monodis
to see the IL code generatedmono -aot=full,static
or mkbundle
to generate native object modules and objdump -CdS
to see the annotated native assembly instructions for that.Note this is purely curiosity, because it rarely happens that I find interesting bottlenecks that way. See however, Jon Skeet's blog posts on performance optimizing Noda.NET
for good examples of surprises that may lurk in the generated IL code for Generic classes.
3Edit not accurate for operators on compiler intrinsics, though even they will just leave their result on the stack.
This is up to the implementation of the CLR and compiler of the managed language. In case of C#, the following test case proofs that there is no difference in instructions for nested if statements and combined if statements:
// case 1
if (value1 < value2)
00000089 mov eax,dword ptr [ebp-0Ch]
0000008c cmp eax,dword ptr [ebp-10h]
0000008f jge 000000A6
{
if (value2 < value3)
00000091 mov eax,dword ptr [ebp-10h]
00000094 cmp eax,dword ptr [ebp-14h]
00000097 jge 000000A6
{
result1 = true;
00000099 mov eax,1
0000009e and eax,0FFh
000000a3 mov dword ptr [ebp-4],eax
}
}
// case 2
if (value1 < value2 && value2 < value3)
000000a6 mov eax,dword ptr [ebp-0Ch]
000000a9 cmp eax,dword ptr [ebp-10h]
000000ac jge 000000C3
000000ae mov eax,dword ptr [ebp-10h]
000000b1 cmp eax,dword ptr [ebp-14h]
000000b4 jge 000000C3
{
result2 = true;
000000b6 mov eax,1
000000bb and eax,0FFh
000000c0 mov dword ptr [ebp-8],eax
}
The two expressions will result in the same number of tests, as the logical and operator (&&
) has short-circuit semantics in both C and C#. The premise of your question (that the second way of expressing the program results in less branching) is therefore incorrect.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With