Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is sse2 enabled by default in g++?

Tags:

c++

linux

gcc

When I run g++ -Q --help=target, I get

-msse2 [disabled].

However, if I create the assembly code of with default options as

g++ -g mycode.cpp -o mycode.o; objdump -S mycode.o > default,

and a sse2 version with

g++ -g -msse2 mycode.cpp -o mycode.sse2.o; objdump -S mycode.sse2.o > sse2,

and finally a non-sse2 version with

g++ -g -mno-sse2 mycode.cpp -o mycode.nosse2.o; objdump -S mycode.nosse2.o > nosse2

I see basically no difference between default and sse2, but a big difference between default and nosse2, so this tells me that, by default, g++ is using sse2 instructions, even though I am being told it is disabled ... what is going on here?

I am compiling on a Xeon E5-2680 under Linux with gcc-4.4.7 if it matters.

like image 367
drjrm3 Avatar asked Jun 08 '15 19:06

drjrm3


People also ask

What does SSE2 capable mean?

SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX.

What does March native do?

Using -march=native enables all instruction subsets supported by the local machine (hence the result might not run on different machines). Using -mtune=native produces code optimized for the local machine under the constraints of the selected instruction set. A generic CPU with 64-bit extensions.


1 Answers

If you are compiling for 64bit, then this is totally fine and documented behavior.

As stated in the gcc docs the SSE instruction set is enabled by default when using an x86-64 compiler:

-mfpmath=unit

Generate floating point arithmetics for selected unit unit. The choices for unit are:

`387'

Use the standard 387 floating point coprocessor present majority of chips and emulated otherwise. Code compiled with this option will run almost everywhere. The temporary results are computed in 80bit precision instead of precision specified by the type resulting in slightly different results compared to most of other chips. See -ffloat-store for more detailed description.

This is the default choice for i386 compiler.

`sse'

Use scalar floating point instructions present in the SSE instruction set. This instruction set is supported by Pentium3 and newer chips, in the AMD line by Athlon-4, Athlon-xp and Athlon-mp chips. The earlier version of SSE instruction set supports only single precision arithmetics, thus the double and extended precision arithmetics is still done using 387. Later version, present only in Pentium4 and the future AMD x86-64 chips supports double precision arithmetics too.

For the i386 compiler, you need to use -march=cpu-type, -msse or -msse2 switches to enable SSE extensions and make this option effective. For the x86-64 compiler, these extensions are enabled by default.

The resulting code should be considerably faster in the majority of cases and avoid the numerical instability problems of 387 code, but may break some existing code that expects temporaries to be 80bit.

This is the default choice for the x86-64 compiler.

like image 56
Uroc327 Avatar answered Oct 02 '22 00:10

Uroc327