I am trying to introduce a generic function with the semantics of ternary operator: E1 ? E2 : E3
. I see that compiler is able to eliminate calculation of one of E2
or E3
depending on E1
condition for the ternary operator. However GCC misses this optimization in case of ternary
function call (even when E2
/E3
have no side effects).
In the listing below function ternary
is written to behave similarly to the ternary operator. However GCC emits potentially heavy call to function f
which seems can be eliminated for some input values (exactly how it is done for ternary operator) because f
is declared with pure attribute - please look at the godbolt link for assembly code generated by GCC.
Is it something that could be improved in GCC (room for optimization) or does the C++ standard explicitly prohibit such kind of optimizations?
// Very heavy function
int f() __attribute__ ((pure));
inline int ternary(bool cond, int n1, int n2) {
return cond ? n1 : n2;
}
int foo1(int i) {
return i == 0 ? f() : 0;
}
int foo2(int i) {
return ternary(i == 0, f(), 0);
}
Assembly listing with -O3 -std=c++11
:
foo1(int):
test edi, edi
jne .L2
jmp f()
.L2:
xor eax, eax
ret
foo2(int):
push rbx
mov ebx, edi
call f()
test ebx, ebx
mov edx, 0
pop rbx
cmovne eax, edx
ret
https://godbolt.org/z/HfpNzo
I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition (as long as E2/E3 has no side effects) for the ternary operator.
The compiler doesn't eliminate it; it just never optimizes it into a cmov
in the first place. The C++ abstract machine doesn't evaluate the not-used side of the ternary operator.
int a, b;
void foo(int sel) {
sel ? a++ : b++;
}
compiles like so (Godbolt):
foo(int):
test edi, edi
je .L2 # if(sel==0) goto
add DWORD PTR a[rip], 1 # ++a
ret
.L2:
add DWORD PTR b[rip], 1 # ++b
ret
The ternary operator can only optimize to an asm cmov
if neither input has any side-effects. Otherwise they're not exactly equivalent.
In the C++ abstract machine (i.e. the input to gcc's optimizer), your foo2
does always call f()
, while your foo1
doesn't. It's no surprise that foo1 compiles the way it does.
For foo2 to compile that way, it would have to optimize away the call to f()
. It's always called to create an arg for ternary()
.
There is a missed-optimization here, which you should report on GCC's bugzilla (use the missed-optimization
keyword as a tag). https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc
A call to int f() __attribute__ ((pure));
should be able to be optimized away. It can read globals, but must not have any side effects. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)
As @melpomene discovered in comments, int f() __attribute__ ((const));
does give you the optimization you're looking for. An __attribute__((const))
function can't even read globals, only its args. (Thus with no args, it must always return a constant.)
HVD points out that gcc doesn't have any cost info for f()
. Even if it could have optimized away the call to ((pure)) f()
as well as to ((const)) f
, maybe it didn't because it didn't know it was more expensive than a conditional branch? Possibly compiling with profile-guided optimization would convince gcc to do something?
But given that it made the call to ((const)) f
conditional in foo2
, gcc may just not know that it can optimize away calls to ((pure))
functions? Maybe it can only CSE them (if no globals have been written), but not optimize away entirely from a basic block? Or maybe the current optimizer just fails to take advantage. Like I said, looks like a missed-opt bug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With