Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why isn't this C++ member function optimized by the compiler with -O3?

The norm member function in the C++ vector class declared below is marked as const and (as far as I can tell) does not contain any side effects.

template <unsigned int N>
struct vector {
  double v[N];

  double norm() const {
    double ret = 0;
    for (int i=0; i<N; ++i) {
      ret += v[i]*v[i];
    }
    return ret;
  }
};

double test(const vector<100>& x) {
  return x.norm() + x.norm();
}

If I call norm multiple times on a const instantiation of vector (see test function above) with the gcc compiler (version 5.4) and optimizations turned on (i.e. -O3) then the compiler inlines norm, but still calculates the result of norm multiple times, even though the result should not change. Why doesn't the compiler optimize out the second call to norm and only calculate this result once? This answer seems to indicate that the compiler should perform this optimization if the compiler determines that the norm function does not have any side-effects. Why doesn't this happen in this case?

Note that I'm determining what the compiler produces using the Compiler Explorer, and that the assembly output for gcc version 5.4 is given below. The clang compiler gives similar results. Note also that if I use gcc's compiler attributes to manually mark norm as a const function using __attribute__((const)), then the second call is optimized out, as I wanted, but my question is why gcc (and clang) do not do this automatically since the norm definition is available?

test(vector<100u>&):
        pxor    xmm2, xmm2
        lea     rdx, [rdi+800]
        mov     rax, rdi
.L2:
        movsd   xmm1, QWORD PTR [rax]
        add     rax, 8
        cmp     rdx, rax
        mulsd   xmm1, xmm1
        addsd   xmm2, xmm1
        jne     .L2
        pxor    xmm0, xmm0
.L3:
        movsd   xmm1, QWORD PTR [rdi]
        add     rdi, 8
        cmp     rdx, rdi
        mulsd   xmm1, xmm1
        addsd   xmm0, xmm1
        jne     .L3
        addsd   xmm0, xmm2
        ret
like image 924
Martin Robinson Avatar asked Mar 06 '17 06:03

Martin Robinson


1 Answers

The compiler can calculate the result of norm and reuse it multiple times. E.g. with the -Os switch:

test(vector<100u> const&):
        xorps   xmm0, xmm0
        xor     eax, eax
.L2:
        movsd   xmm1, QWORD PTR [rdi+rax]
        add     rax, 8
        cmp     rax, 800
        mulsd   xmm1, xmm1
        addsd   xmm0, xmm1
        jne     .L2
        addsd   xmm0, xmm0
        ret

The missing optimization isn't due to not-associative-floating-point-math or some observable-behavior-issue.


In a not properly mutexed environment another function might change the contents in the array in between the calls of norm

It may happen, but it's not a concern for the compiler (e.g. https://stackoverflow.com/a/25472679/3235496).


Compiling the example with the -O2 -fdump-tree-all switch you can see that:

  • g++ correctly detects vector<N>::norm() as a pure function (output file .local-pure-const1);
  • inlining happens at early stages (output file .einline).

Also note that marking norm with __attribute__ ((noinline)) the compiler performs CSE:

test(vector<100u> const&):
    sub     rsp, 8
    call    vector<100u>::norm() const
    add     rsp, 8
    addsd   xmm0, xmm0
    ret

Marc Glisse is (probably) right.

A more advanced form of Common Subexpression Elimination is required to un-inline the recurrent expression.

like image 122
manlio Avatar answered Sep 30 '22 19:09

manlio