How to force GCC to assume that a floating-point expression is non-negative?

Tags:

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot, this is not relevant to the question), and the expression under the square root is clearly non-negative. However, GCC outputs the following assembly for sqrt(x*x):

        mulss   xmm0, xmm0         pxor    xmm1, xmm1         ucomiss xmm1, xmm0         ja      .L10         sqrtss  xmm0, xmm0         ret .L10:         jmp     sqrtf

That is, it compares the result of x*x to zero, and if the result is non-negative, it does the sqrtss instruction, otherwise it calls sqrtf.

So, my question is: how can I force GCC into assuming that x*x is always non-negative so that it skips the comparison and the sqrtf call, without writing inline assembly?

I wish to emphasize that I am interested in a local solution, and not doing things like -ffast-math, -fno-math-errno, or -ffinite-math-only (though these do indeed solve the issue, thanks to ks1322, harold, and Eric Postpischil in the comments).

Furthemore, "force GCC into assuming x*x is non-negative" should be interpreted as assert(x*x >= 0.f), so this also excludes the case of x*x being NaN.

I am OK with compiler-specific, platform-specific, CPU-specific, etc. solutions.

648

asked Aug 27 '19 11:08

lisyarus

1 Answers

You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C:

#include <cmath>  float test1 (float x) {     float tmp = x*x;     if (!(tmp >= 0.0f))          __builtin_unreachable();         return std::sqrt(tmp); }

(related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call it promise() or something.)

But gcc doesn't know how to take advantage of that promise that tmp is non-NaN and non-negative. We still get (Godbolt) the same canned asm sequence that checks for x>=0 and otherwise calls sqrtf to set errno. Presumably that expansion into a compare-and-branch happens after other optimization passes, so it doesn't help for the compiler to know more.

This is a missed-optimization in the logic that speculatively inlines sqrt when -fmath-errno is enabled (on by default unfortunately).

What you want instead is `-fno-math-errno`, which is safe globally

This is 100% safe if you don't rely on math functions ever setting errno. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11 fenv access via #pragma STDC FENV_ACCESS ON and then functions like fetestexcept(). See the example in feclearexcept which shows using it to detect division by zero.

The FP environment is part of thread context while errno is global.

Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use fenv. Ideally support for -fmath-errno would be as cheap as possible but the rarity of anyone actually using __builtin_unreachable() or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.

Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g. x86's mxcsr status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like fetestexcept(). And if not, then math-errno probably doesn't work either.

errno for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought: Why errno is not set to EDOM even sqrt takes out of domain arguement? explains that setting errno in math functions is optional in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.

It's a big mistake to lump -fno-math-errno in with value-changing optimizations like -ffast-math or -ffinite-math-only. You should strongly consider enabling it globally, or at least for the whole file containing this function.

float test2 (float x) {     return std::sqrt(x*x); }

# g++ -fno-math-errno -std=gnu++17 -O3 test2(float):   # and test1 is the same         mulss   xmm0, xmm0         sqrtss  xmm0, xmm0         ret

You might as well use -fno-trapping-math as well, if you aren't ever going to unmask any FP exceptions with feenableexcept(). (Although that option isn't required for this optimization, it's only the errno-setting crap that's a problem here.).

-fno-trapping-math doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result. -ftrapping-math is the default but it's broken and "never worked" according to GCC dev Marc Glisse. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192 (make it off by default) is still open.

If you actually ever did unmask exceptions, it might be better to have -ftrapping-math, but again it's very rare that you'd ever want that instead of just checking flags after some math operations, or checking for NaN. And it doesn't actually preserve exact exception semantics anyway.

See SIMD for float threshold operation for a case where the -ftrapping-math default incorrectly blocks a safe optimization. (Even after hoisting a potentially-trapping operation so the C does it unconditionally, gcc makes non-vectorized asm that does it conditionally! So not only does GCC block vectorization, it changes the exception semantics vs. the C abstract machine.) -fno-trapping-math enables the expected optimization.

183

answered Sep 29 '22 19:09

Peter Cordes

Related questions
                            
                                getting a normal ptr from shared_ptr?
                            
                                cannot open shared object file: No such file or directory
                            
                                Template issue causes linker error (C++) [duplicate]
                            
                                How can I make a variable always equal to the result of some calculations?
                            
                                Can a C++ default argument be initialized with another argument? [duplicate]
                            
                                Initialize all the elements of an array to the same number
                            
                                Why is there not an std::is_struct type trait?
                            
                                Does new char actually guarantee aligned memory for a class type?
                            
                                Should custom containers have free begin/end functions?
                            
                                Default template parameter partial specialization
                            
                                C++ cache aware programming
                            
                                Comparing STL strings that use different allocators
                            
                                Printing additional output in Google Test
                            
                                Weird use of `?:` in `typeid` code
                            
                                Should I default virtual destructors?
                            
                                Why does std::move prevent RVO?
                            
                                Why is `const T&` not sure to be const?
                            
                                How to convert errno to exception using <system_error>
                            
                                Are there any concurrent containers in C++11? [closed]
                            
                                Template static variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to force GCC to assume that a floating-point expression is non-negative?

Tags:

c++

floating-point

gcc

assembly

micro-optimization

lisyarus

People also ask

1 Answers

What you want instead is `-fno-math-errno`, which is safe globally

Peter Cordes

Recent Activity

Donate For Us

How to force GCC to assume that a floating-point expression is non-negative?

Tags:

c++

floating-point

gcc

assembly

micro-optimization

lisyarus

People also ask

1 Answers

What you want instead is -fno-math-errno, which is safe globally

Peter Cordes

Related questions

Recent Activity

Donate For Us

What you want instead is `-fno-math-errno`, which is safe globally