gcc, simd intrinsics and fast-math concepts

Tags:

Hi all :)
I'm trying to get a hang on a few concepts regarding floating point, SIMD/math intrinsics and the fast-math flag for gcc. More specifically, I'm using MinGW with gcc v4.5.0 on a x86 cpu.

I've searched around for a while now, and that's what I (think I) understand at the moment:

When I compile with no flags, any fp code will be standard x87, no simd intrinsics, and the math.h functions will be linked from msvcrt.dll.

When I use mfpmath, mssen and/or march so that mmx/sse/avx code gets enabled, gcc actually uses simd instructions only if I also specify some optimization flags, like On or ftree-vectorize. In which case the intrinsics are chosen automagically by gcc, and some math functions (I'm still talking about the standard math funcs on math.h) will become intrinsics or optimized out by inline code, some others will still come from the msvcrt.dll. If I don't specify optimization flags, does any of this change?

When I use specific simd data types (those available as gcc extensions, like v4si or v8qi), I have the option to call intrinsic funcs directly, or again leave the automagic decision to gcc. Gcc can still chose standard x87 code if I don't enable simd instructions via the proper flags. Again, if I don't specify optimization flags, does any of this change?

Plese correct me if any of my statements is wrong :p

Now the questions:

Do I ever have to include x86intrin.h to use intrinsics?
Do I ever have to link the libm?
What fast-math has to do with anything? I understand it relaxes the IEEE standard, but, specifically, how? Other standard functions are used? Some other lib is linked? Or are just a couple of flags set somewhere and the standard lib behaves differently?

Thanks to anybody who is going to help :D

707

asked Feb 11 '11 07:02

rocket441

1 Answers

Ok, I'm ansewring for anyone who is struggling a bit to grasp these concepts like me.

Optimizations with Ox work on any kind of code, fpu or sse

fast-math seems to work only on x87 code. Also, it doesn't seem to change the fpu control word o_O

Builtins are always included. This behavior can be avoided for some builtins, with some flags, like strict or no-builtins.

The libm.a is used for some stuff that is not included in the glibc, but with mingw it's just a dummy file, so at the moment it's useless to link to it

Using the special vector types of gcc seems useful only when calling the intrinsics directly, otherwise the code gets vectorized anyway.

Any correction is welcomed :)

Useful links:
fpu / sse control
gcc math
and the gcc manual on "Vector Extensions", "X86 Built-in functions" and "Other Builtins"

154

answered Oct 07 '22 18:10

rocket441

Related questions
                            
                                How can I get the gcc preprocessor to check if an expression evaluates to a value or nothing?
                            
                                Behavior of c++ exceptions escaping into c program
                            
                                What is a pure-C alternative to STL containers? [duplicate]
                            
                                C++0x IDE support with g++
                            
                                Difference between MinGW and the regular GCC?
                            
                                Why does GCC use mov/mfence instead of xchg to implement C11's atomic_store?
                            
                                Error in template instantiation in GCC 4.9, works fine in GCC 4.8
                            
                                Increase string literal length limit
                            
                                Is there a way to let Android NDK-build use a newer version of gcc?
                            
                                Simple program crashes
                            
                                What is the correct way to build a thread-safe, multiplatform C library?
                            
                                evaluating/accessing a structure
                            
                                non-deferred static member initialization for templates in gcc?
                            
                                What's wrong with using associativity by compilers?
                            
                                Brace initialization of template struct
                            
                                The point of destroying a temporary object when it created in a member-initializer
                            
                                Lightweight spinlocks built from GCC atomic operations?
                            
                                Namespace and static class members linking
                            
                                Template conversion function to const-reference
                            
                                Invalid explicitly-specified argument in clang but successful compilation in gcc — who's wrong?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

gcc, simd intrinsics and fast-math concepts

Tags:

gcc

fast-math

simd

intrinsics

rocket441

People also ask

1 Answers

rocket441

Recent Activity

Donate For Us