I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss
e.g.
__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);
Now I expected std::max
type behavior when NaNs are involved but clang and gcc give different results:
Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000
d: nan 0.000000 0.000000 0.000000
_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math
, -fno-fast-math
but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?
Godbolt link here
My understanding is that IEEE-754 requires: (NaN cmp x)
to return false
for all cmp
operators {==, <, <=, >, >=}
, except {!=}
which returns true
. An implementation of a max()
function might be defined in terms of any of the inequality operators.
So, the question is, how is _mm_max_ps
implemented? With {<, <=, >, >=}
, or a bit comparison?
Interestingly, when disabling optimization in your link, the corresponding maxss
instruction is used by both gcc and clang. Both yield:
2.000000 0.000000 0.000000 0.000000
nan 0.000000 0.000000 0.000000
This suggests, given: max(NaN, 2.0f) -> 2.0f
, that: max(a, b) = (a op b) ? a : b
, where op
is one of: {<, <=, >, >=}
. With IEEE-754 rules, the result of this comparison is always false, so:
(NaN op val)
is always false, returning (val)
,(val op NaN)
is always false, returning (NaN)
With optimization on, the compiler is free to precompute (c)
and (d)
at compile time. It appears that clang evaluates the results as the maxss
instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max()
- it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss
semantics.
GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.
From Intel's ISA reference:
If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).
If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With