The norm
member function in the C++ vector
class declared below is marked as const
and (as far as I can tell) does not contain any side effects.
template <unsigned int N>
struct vector {
double v[N];
double norm() const {
double ret = 0;
for (int i=0; i<N; ++i) {
ret += v[i]*v[i];
}
return ret;
}
};
double test(const vector<100>& x) {
return x.norm() + x.norm();
}
If I call norm
multiple times on a const
instantiation of vector
(see test
function above) with the gcc compiler (version 5.4) and optimizations turned on (i.e. -O3
) then the compiler inlines norm
, but still calculates the result of norm
multiple times, even though the result should not change. Why doesn't the compiler optimize out the second call to norm
and only calculate this result once? This answer seems to indicate that the compiler should perform this optimization if the compiler determines that the norm
function does not have any side-effects. Why doesn't this happen in this case?
Note that I'm determining what the compiler produces using the Compiler Explorer, and that the assembly output for gcc version 5.4 is given below. The clang compiler gives similar results. Note also that if I use gcc's compiler attributes to manually mark norm
as a const function using __attribute__((const))
, then the second call is optimized out, as I wanted, but my question is why gcc (and clang) do not do this automatically since the norm
definition is available?
test(vector<100u>&):
pxor xmm2, xmm2
lea rdx, [rdi+800]
mov rax, rdi
.L2:
movsd xmm1, QWORD PTR [rax]
add rax, 8
cmp rdx, rax
mulsd xmm1, xmm1
addsd xmm2, xmm1
jne .L2
pxor xmm0, xmm0
.L3:
movsd xmm1, QWORD PTR [rdi]
add rdi, 8
cmp rdx, rdi
mulsd xmm1, xmm1
addsd xmm0, xmm1
jne .L3
addsd xmm0, xmm2
ret
The compiler can calculate the result of norm
and reuse it multiple times. E.g. with the -Os
switch:
test(vector<100u> const&):
xorps xmm0, xmm0
xor eax, eax
.L2:
movsd xmm1, QWORD PTR [rdi+rax]
add rax, 8
cmp rax, 800
mulsd xmm1, xmm1
addsd xmm0, xmm1
jne .L2
addsd xmm0, xmm0
ret
The missing optimization isn't due to not-associative-floating-point-math or some observable-behavior-issue.
In a not properly mutexed environment another function might change the contents in the array in between the calls of norm
It may happen, but it's not a concern for the compiler (e.g. https://stackoverflow.com/a/25472679/3235496).
Compiling the example with the -O2 -fdump-tree-all
switch you can see that:
vector<N>::norm()
as a pure function (output file .local-pure-const1
);.einline
).Also note that marking norm
with __attribute__ ((noinline))
the compiler performs CSE:
test(vector<100u> const&):
sub rsp, 8
call vector<100u>::norm() const
add rsp, 8
addsd xmm0, xmm0
ret
Marc Glisse is (probably) right.
A more advanced form of Common Subexpression Elimination is required to un-inline the recurrent expression.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With