Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC seemingly misses simple optimization

I am trying to introduce a generic function with the semantics of ternary operator: E1 ? E2 : E3. I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition for the ternary operator. However GCC misses this optimization in case of ternary function call (even when E2/E3 have no side effects).

In the listing below function ternary is written to behave similarly to the ternary operator. However GCC emits potentially heavy call to function f which seems can be eliminated for some input values (exactly how it is done for ternary operator) because f is declared with pure attribute - please look at the godbolt link for assembly code generated by GCC.

Is it something that could be improved in GCC (room for optimization) or does the C++ standard explicitly prohibit such kind of optimizations?

// Very heavy function
int f() __attribute__ ((pure));

inline int ternary(bool cond, int n1, int n2) {
    return cond ? n1 : n2;
}

int foo1(int i) {
    return i == 0 ? f() : 0;
}

int foo2(int i) {
    return ternary(i == 0, f(), 0);
}

Assembly listing with -O3 -std=c++11:

foo1(int):
  test edi, edi
  jne .L2
  jmp f()
.L2:
  xor eax, eax
  ret
foo2(int):
  push rbx
  mov ebx, edi
  call f()
  test ebx, ebx
  mov edx, 0
  pop rbx
  cmovne eax, edx
  ret

https://godbolt.org/z/HfpNzo

like image 695
eXXXXXXXXXXX2 Avatar asked Sep 13 '18 19:09

eXXXXXXXXXXX2


1 Answers

I see that compiler is able to eliminate calculation of one of E2 or E3 depending on E1 condition (as long as E2/E3 has no side effects) for the ternary operator.

The compiler doesn't eliminate it; it just never optimizes it into a cmov in the first place. The C++ abstract machine doesn't evaluate the not-used side of the ternary operator.

int a, b;
void foo(int sel) {
    sel ? a++ : b++;
}

compiles like so (Godbolt):

foo(int):
    test    edi, edi
    je      .L2                # if(sel==0) goto
    add     DWORD PTR a[rip], 1   # ++a
    ret
.L2:
    add     DWORD PTR b[rip], 1   # ++b
    ret

The ternary operator can only optimize to an asm cmov if neither input has any side-effects. Otherwise they're not exactly equivalent.


In the C++ abstract machine (i.e. the input to gcc's optimizer), your foo2 does always call f(), while your foo1 doesn't. It's no surprise that foo1 compiles the way it does.

For foo2 to compile that way, it would have to optimize away the call to f(). It's always called to create an arg for ternary().


There is a missed-optimization here, which you should report on GCC's bugzilla (use the missed-optimization keyword as a tag). https://gcc.gnu.org/bugzilla/enter_bug.cgi?product=gcc

A call to int f() __attribute__ ((pure)); should be able to be optimized away. It can read globals, but must not have any side effects. (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html)

As @melpomene discovered in comments, int f() __attribute__ ((const)); does give you the optimization you're looking for. An __attribute__((const)) function can't even read globals, only its args. (Thus with no args, it must always return a constant.)

HVD points out that gcc doesn't have any cost info for f(). Even if it could have optimized away the call to ((pure)) f() as well as to ((const)) f, maybe it didn't because it didn't know it was more expensive than a conditional branch? Possibly compiling with profile-guided optimization would convince gcc to do something?

But given that it made the call to ((const)) f conditional in foo2, gcc may just not know that it can optimize away calls to ((pure)) functions? Maybe it can only CSE them (if no globals have been written), but not optimize away entirely from a basic block? Or maybe the current optimizer just fails to take advantage. Like I said, looks like a missed-opt bug.

like image 109
Peter Cordes Avatar answered Nov 13 '22 22:11

Peter Cordes