Here's my code:
int f(double x, double y)
{
return std::isnan(x) || std::isnan(y);
}
If you're using C instead of C++, just replace std::
with __builtin_
(don't simply remove std::
, for reasons shown here: Why does GCC implement isnan() more efficiently for C++ <cmath> than C <math.h>?).
Here's the assembly:
ucomisd %xmm0, %xmm0 ; set parity flag if x is NAN
setp %dl ; copy parity flag to %edx
ucomisd %xmm1, %xmm1 ; set parity flag if y is NAN
setp %al ; copy parity flag to %eax
orl %edx, %eax ; OR one byte of each result into a full-width register
Now let's try an alternative formulation that does the same thing:
int f(double x, double y)
{
return std::isunordered(x, y);
}
Here's the assembly for the alternative:
xorl %eax, %eax
ucomisd %xmm1, %xmm0
setp %al
This is great--we cut the generated code almost in half! This works because ucomisd
sets the parity flag if either of its operands is NAN, so we can test two values at a time, SIMD-style.
You can see code like the original version in the wild, for example: https://svn.r-project.org/R/trunk/src/nmath/qnorm.c
If we could make GCC smart enough to combine two isnan()
calls everywhere, that would be pretty cool. My question is: can we, and how? I have some idea of how compilers work, but I don't know where in GCC this sort of optimization could be performed. The basic idea is whenever there is a pair of isnan()
(or __builtin_isnan
) calls OR'd together, it should emit a single ucomisd
instruction using the two operands at the same time.
Edited to add some research prompted by Basile Starynkevitch's answer:
If I compile with -fdump-tree-all, I find two files which seem relevant. First, *.gimple
contains this (and a bit more):
D.2229 = x unord x;
D.2230 = y unord y;
D.2231 = D.2229 | D.2230;
Here we can clearly see that GCC knows it will pass (x, x)
to isunordered()
. If we want to optimize by transforming at this level, the rule would be roughly: "Replace a unord a | b unord b
with a unord b
." This is what you get when compiling my second C code:
D.2229 = x unord y;
Another interesting file is *.original
:
return <retval> = (int) (x unord x || y unord y);
That's actually the entire non-comment file generated by -fdump-tree-original
. And for the better source code it looks like this:
return <retval> = x unord y;
Clearly the same sort of transformation can be applied (just here it's ||
instead of |
).
But unfortunately if we modify the source code to e.g.:
if (__builtin_isnan(x))
return true;
if (__builtin_isnan(y))
return true;
return false;
Then we get quite different Gimple and Original output files, though the final assembly is the same as before. So maybe it's better to attempt this transformation at a later stage in the pipeline? The *.optimized
file (among others) shows the same code for the version with "if"s as for the original version, so that's promising.
This optimization is not only possible, it is now available in gcc-6: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=222077
There are two questions:
is the optimization you are proposing always legal in the strict C++11 standard (I don't know).
can GCC be customized by adding such an optimization: yes! You could extend it using MELT -e.g. write your own MELT extension doing that- or with your own GCC plugin coded (painfully) in C++ .
However, adding an extra optimization in GCC is a significant work (even with MELT): you need to understand the internals of GCC. So it is more than a week of work probably.
And I am not sure that such an optimization is really worth the effort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With