Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

_mm_max_ss has different behavior between clang and gcc

Tags:

c++

x86

gcc

clang

sse

I'm trying to cross compile a project using clang and gcc but I'm seeing some odd differences when using _mm_max_ss e.g.

__m128 a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
__m128 b = _mm_set_ss(2.0f);
__m128 c = _mm_max_ss(a,b);
__m128 d = _mm_max_ss(b,a);

Now I expected std::max type behavior when NaNs are involved but clang and gcc give different results:

Clang: (what I expected)
c: 2.000000 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

Gcc: (Seems to ignore order)
c: nan 0.000000 0.000000 0.000000 
d: nan 0.000000 0.000000 0.000000 

_mm_max_ps does the expected thing when I use it. I've tried using -ffast-math, -fno-fast-math but it doesn't seem to have an effect. Any ideas to make the behavior similar across compilers?

Godbolt link here

like image 541
Biggy Smith Avatar asked Mar 09 '21 19:03

Biggy Smith


1 Answers

My understanding is that IEEE-754 requires: (NaN cmp x) to return false for all cmp operators {==, <, <=, >, >=}, except {!=} which returns true. An implementation of a max() function might be defined in terms of any of the inequality operators.

So, the question is, how is _mm_max_ps implemented? With {<, <=, >, >=}, or a bit comparison?

Interestingly, when disabling optimization in your link, the corresponding maxss instruction is used by both gcc and clang. Both yield:

2.000000 0.000000 0.000000 0.000000 
nan 0.000000 0.000000 0.000000

This suggests, given: max(NaN, 2.0f) -> 2.0f, that: max(a, b) = (a op b) ? a : b, where op is one of: {<, <=, >, >=}. With IEEE-754 rules, the result of this comparison is always false, so:

(NaN op val) is always false, returning (val),
(val op NaN) is always false, returning (NaN)

With optimization on, the compiler is free to precompute (c) and (d) at compile time. It appears that clang evaluates the results as the maxss instruction would - correct 'as-if' behaviour. GCC is either falling back on another implementation of max() - it uses the GMP and MPFR libraries for compile-time numerics - or is just being careless with the _mm_max_ss semantics.

GCC is still getting it wrong with 10.2 and trunk versions on godbolt. So I think you've found a bug! I haven't answered the second part, because I can't think of an all-purpose hack that will efficiently work around this.


From Intel's ISA reference:

If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN from either source operand be returned, the action of MAXSS can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

like image 177
Brett Hale Avatar answered Sep 28 '22 06:09

Brett Hale