I have heard from various sources (though mostly from a colleague of mine), that compiling with an optimisation level of <code>-O3</code> in g++ is somehow 'dangerous', and should be avoided in general unless proven to be necessary. Is this true, and if so, why? Should I just be sticking to <code>-O2</code>?

In my somewhat checkered experience, applying <code>-O3</code> to an entire program almost always makes it slower (relative to <code>-O2</code>), because it turns on aggressive loop unrolling and inlining that make the program no longer fit in the instruction cache. For larger programs, this can also be true for <code>-O2</code> relative to <code>-Os</code>! The intended use pattern for <code>-O3</code> is, after profiling your program, you manually apply it to a small handful of files containing critical inner loops that actually benefit from these aggressive space-for-speed tradeoffs. Newer versions of GCC have a profile-guided optimization mode that can (IIUC) selectively apply the <code>-O3</code> optimizations to hot functions -- effectively automating this process.

Recently I experienced a problem when using optimization with <code>g++</code>. The problem was related to a PCI card, where the registers (for commands and data) were represented by a memory address. My driver mapped the physical address to a pointer within the application and gave it to the called process, which worked with it like this: <pre class="prettyprint"><code>unsigned int * pciMemory; askDriverForMapping( & pciMemory ); ... pciMemory[ 0 ] = someCommandIdx; pciMemory[ 0 ] = someCommandLength; for ( int i = 0; i < sizeof( someCommand ); i++ ) pciMemory[ 0 ] = someCommand[ i ]; </code></pre> The card didn't act as expected. When I saw the assembly code I understood that the compiler only wrote <code>someCommand[ the last ]</code> into <code>pciMemory</code>, omitting all preceding writes. In conclusion: be accurate and attentive with optimization.

Is optimisation level -O3 dangerous in g++?

Tags:

c++

optimization

g++

compiler-flags

I have heard from various sources (though mostly from a colleague of mine), that compiling with an optimisation level of -O3 in g++ is somehow 'dangerous', and should be avoided in general unless proven to be necessary.

Is this true, and if so, why? Should I just be sticking to -O2?

914

asked Jul 18 '12 16:07

Matt Dunn

5 Answers

In the early days of gcc (2.8 etc.) and in the times of egcs, and redhat 2.96 -O3 was quite buggy sometimes. But this is over a decade ago, and -O3 is not much different than other levels of optimizations (in buggyness).

It does however tend to reveal cases where people rely on undefined behavior, due to relying more strictly on the rules, and especially corner cases, of the language(s).

As a personal note, I am running production software in the financial sector for many years now with -O3 and have not yet encountered a bug that would not have been there if I would have used -O2.

By popular demand, here an addition:

-O3 and especially additional flags like -funroll-loops (not enabled by -O3) can sometimes lead to more machine code being generated. Under certain circumstances (e.g. on a cpu with exceptionally small L1 instruction cache) this can cause a slowdown due to all the code of e.g. some inner loop now not fitting anymore into L1I. Generally gcc tries quite hard to not to generate so much code, but since it usually optimizes the generic case, this can happen. Options especially prone to this (like loop unrolling) are normally not included in -O3 and are marked accordingly in the manpage. As such it is generally a good idea to use -O3 for generating fast code, and only fall back to -O2 or -Os (which tries to optimize for code size) when appropriate (e.g. when a profiler indicates L1I misses).

If you want to take optimization into the extreme, you can tweak in gcc via --param the costs associated with certain optimizations. Additionally note that gcc now has the ability to put attributes at functions that control optimization settings just for these functions, so when you find you have a problem with -O3 in one function (or want to try out special flags for just that function), you don't need to compile the whole file or even whole project with O2.

otoh it seems that care must be taken when using -Ofast, which states:

-Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard compliant programs.

which makes me conclude that -O3 is intended to be fully standards compliant.

answered Oct 05 '22 02:10

PlasmaHH

In my somewhat checkered experience, applying -O3 to an entire program almost always makes it slower (relative to -O2), because it turns on aggressive loop unrolling and inlining that make the program no longer fit in the instruction cache. For larger programs, this can also be true for -O2 relative to -Os!

The intended use pattern for -O3 is, after profiling your program, you manually apply it to a small handful of files containing critical inner loops that actually benefit from these aggressive space-for-speed tradeoffs. Newer versions of GCC have a profile-guided optimization mode that can (IIUC) selectively apply the -O3 optimizations to hot functions -- effectively automating this process.

answered Oct 05 '22 04:10

zwol

Yes, O3 is buggier. I'm a compiler developer and I've identified clear and obvious gcc bugs caused by O3 generating buggy SIMD assembly instructions when building my own software. From what I've seen, most production software ships with O2 which means O3 will get less attention wrt testing and bug fixes.

Think of it this way: O3 adds more transformations on top of O2, which adds more transformations on top of O1. Statistically speaking, more transformations means more bugs. That's true for any compiler.

answered Oct 05 '22 03:10

David Yeager

-O3 option turns on more expensive optimizations, such as function inlining, in addition to all the optimizations of the lower levels ‘-O2’ and ‘-O1’. The ‘-O3’ optimization level may increase the speed of the resulting executable, but can also increase its size. Under some circumstances where these optimizations are not favorable, this option might actually make a program slower.

answered Oct 05 '22 03:10

neel

Recently I experienced a problem when using optimization with g++. The problem was related to a PCI card, where the registers (for commands and data) were represented by a memory address. My driver mapped the physical address to a pointer within the application and gave it to the called process, which worked with it like this:

unsigned int * pciMemory;
askDriverForMapping( & pciMemory );
...
pciMemory[ 0 ] = someCommandIdx;
pciMemory[ 0 ] = someCommandLength;
for ( int i = 0; i < sizeof( someCommand ); i++ )
    pciMemory[ 0 ] = someCommand[ i ];

The card didn't act as expected. When I saw the assembly code I understood that the compiler only wrote someCommand[ the last ] into pciMemory, omitting all preceding writes.

In conclusion: be accurate and attentive with optimization.

answered Oct 05 '22 02:10

borisbn

Related questions
                            
                                What is the difference between "long", "long long", "long int", and "long long int" in C++?
                            
                                Can lambda functions be templated?
                            
                                Is the 'override' keyword just a check for a overridden virtual method?
                            
                                What does "default" mean after a class' function declaration?
                            
                                Function passed as template argument
                            
                                Remove last character from C++ string
                            
                                Why can I not push_back a unique_ptr into a vector?
                            
                                What is "rvalue reference for *this"?
                            
                                What are forward declarations in C++?
                            
                                C++ sorting and keeping track of indexes
                            
                                Can I set a breakpoint on 'memory access' in GDB?
                            
                                Are there benefits of passing by pointer over passing by reference in C++?
                            
                                What does {0} mean when initializing an object?
                            
                                How to properly overload the << operator for an ostream?
                            
                                How do I add a linker or compile flag in a CMake file?
                            
                                C++, What does the colon after a constructor mean? [duplicate]
                            
                                Autocompletion in Vim
                            
                                How to convert QString to std::string?
                            
                                Forward declaration of a typedef in C++
                            
                                Return array in a function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With