I'm experimenting with the generated assembly and found an interesting thing. There are two function doing an identical computation. The only difference between them is the way how the results are summed together.
#include <cmath>
double func1(double x, double y)
{
double result1;
double result2;
if (x*x < 0.0) result1 = 0.0;
else
{
result1 = x*x+x+y;
}
if (y*y < 0.0) result2 = 0.0;
else
{
result2 = y*y+y+x;
}
return (result1 + result2) * 40.0;
}
double func2(double x, double y)
{
double result = 0.0;
if (x*x >= 0.0)
{
result += x*x+x+y;
}
if (y*y >= 0.0)
{
result += y*y+y+x;
}
return result * 40.0;
}
The assembly generated by x86 clang 3.7 with -O2
switch on gcc.godbolt.org is yet so much different and unexpected. (compilation on gcc results in similar assembly)
.LCPI0_0:
.quad 4630826316843712512 # double 40
func1(double, double): # @func1(double, double)
movapd %xmm0, %xmm2
mulsd %xmm2, %xmm2
addsd %xmm0, %xmm2
addsd %xmm1, %xmm2
movapd %xmm1, %xmm3
mulsd %xmm3, %xmm3
addsd %xmm1, %xmm3
addsd %xmm0, %xmm3
addsd %xmm3, %xmm2
mulsd .LCPI0_0(%rip), %xmm2
movapd %xmm2, %xmm0
retq
.LCPI1_0:
.quad 4630826316843712512 # double 40
func2(double, double): # @func2(double, double)
movapd %xmm0, %xmm2
movapd %xmm2, %xmm4
mulsd %xmm4, %xmm4
xorps %xmm3, %xmm3
ucomisd %xmm3, %xmm4
xorpd %xmm0, %xmm0
jb .LBB1_2
addsd %xmm2, %xmm4
addsd %xmm1, %xmm4
xorpd %xmm0, %xmm0
addsd %xmm4, %xmm0
.LBB1_2:
movapd %xmm1, %xmm4
mulsd %xmm4, %xmm4
ucomisd %xmm3, %xmm4
jb .LBB1_4
addsd %xmm1, %xmm4
addsd %xmm2, %xmm4
addsd %xmm4, %xmm0
.LBB1_4:
mulsd .LCPI1_0(%rip), %xmm0
retq
func1
compiles to a branchless assembly, involving much less instructions than func2
. thus func2
is expected to be much slower than func1
.
Can someone explain this behavior?
The reason for this behaviour of the comparison operators <
or >=
differs whether your double
is NaN
or not a NaN
. All comparisons where one of the operands is NaN
return false
. So your x*x < 0.0
will always be false regardless of whether x
is NaN
or not. So the compiler can safely optimize this away. However, the case of x * x >= 0
will behave differently for NaN
and non-NaN
values, thus the compiler leaves the conditional jumps in the assembly.
This is what cppreference says about comparing with NaNs involved:
the values of the operands after conversion are compared in the usual mathematical sense (except that positive and negative zeroes compare equal and any comparison involving a NaN value returns zero)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With