Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When can I confidently compile program with -O3?

I've seen a lot of people complaining about -O3 option:

  • GCC: program doesn't work with compilation option -O3
  • Floating Point Problem provided by David Hammen

I check the manual from the GCC:

   -O3    Optimize yet more.  -O3 turns on all optimizations
          specified   by   -O2   and   also   turns  on  the
          -finline-functions and -frename-registers options.

And I've also confirmed the code to make sure that two options is the only two optimizations included with -O3 on:

if (optimize >= 3){
    flag_inline_functions = 1;
    flag_rename_registers = 1;
}

For those two optimizations:

  • -finline-functions is useful in some cases (mainly with C++) because it lets us define the size of inlined functions (600 by default) with -finline-limit. Compiler may report an error complaining about lack of memory when set a high inline-limit.
  • -frename-registers attempts to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization will most benefit processors with lots of registers.

For inline-functions, although it can reduce the numbers of function calls, but it may lead to a large binary files, so -finline-functions may introduce severe cache penalties and become even slower than -O2. I think the cache penalties not only depends on the program itself.

For rename-registers, I don't think it will have any positive impact on a cisc architecture like x86.

My question has 2.5 parts:

  1. Am I right to claim that whether a program can run faster with -O3 option depends on the underlying platform/architecture? [Answered]

    EDIT:

    The 1st part has been confirmed as true. David Hammen also claim that we should be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers like Intel and AMD.

  2. When can I confidently use -O3 option? I suppose these two optimizations especially the rename-registers may lead to a different behaviors from -O0/O2. I saw some programs compiled with -O3 got crashed during execution, is it deterministic? If I run an executable once without any crash, does it mean it is safe to use -O3?

    EDIT: The deterministicity has nothing to do with the optimization, it is a multithreading problem. However, for a multithread program, it is not safe to use -O3 when we run an executable once without errors. David Hammen shows that -O3 optimization on floating point operations may violate the strict weak ordering criterion for a comparison. Is there any other concern we need to take care when we want to use -O3 option?

  3. If the answer of the 1st question is "yes", then when I change the target platform or in a distributed system with different machines, I may need to change between -O3 and -O2. Is there any general ways to decide whether I can get a performance improvement with -O3? For example, more registers, short inline functions, etc. [Answered]

    EDIT: The 3rd part has been answered by Louen as "the variety of platforms make general reasoning about this problem impossible" When evaluating the performance gain by -O3, we have to try it with both and benchmark our code to see which is faster.

like image 727
StarPinkER Avatar asked Dec 21 '22 09:12

StarPinkER


2 Answers

  1. I saw some programs got crashed when compiling with -O3, is it deterministic?

If the program is single threaded, all algorithms used by program are deterministic, and if the inputs from run to run are identical, yes. The answer is "not necessarily" if any of those conditions is not true.

The same applies if you compile without using -O3.

If I run an executable once without any crash, does it mean it is safe to use -O3?

Of course not. Once again, the same applies if you compile without using -O3. Just because your application runs once does not mean it will run successfully in all cases. That's part of what makes testing a hard problem.


Floating point operations can result in weird behaviors on machines in which the floating point registers have greater precision than do doubles. For example,

void add (double a, double b, double & result) {
   double temp = a + b;
   result = temp;
   if (result != temp) {
      throw FunkyAdditionError (temp);
   }
}

Compile a program that uses this add function unoptimized and you probably will never see any FunkyAdditionError exceptions. Compile optimized and certain inputs will suddenly start resulting in these exceptions. The problem is that with optimization, the compiler will make temp a register while result, being a reference, won't be compiled away into a register. Add an inline qualifier and those exceptions may disappear when your compiler is compiled with -O3 because now result can also be a register. Optimization with regard to floating point operations can be a tricky subject.

Finally, let's look at one of those cases where things did go bump in the night when a program was compiled with -O3, GCC: program doesn't work with compilation option -O3. The problem only occurred with -O3 because the compiler probably inlined the distance function but kept one (but not both) of the results in an extended precision floating point register. With this optimization, certain points p1 and p2 can result in both p1<p2 and p2<p1 evaluating to true. This violates the strict weak ordering criterion for a comparison function.

You need to be very careful with regard to how optimization and floating point operations interact on machines with extended precision floating point registers (e.g., Intel and AMD).

like image 67
David Hammen Avatar answered Dec 24 '22 01:12

David Hammen


1) and 3) You are right. Some programs can benefit from the optimizations enabled by -O3 and some won't. For example, inlining more functions is usually better (because it bypasses the function call mechanism overhead) but sometimes it can make things slower (by impairing cache locality for example). That and the variety of platforms make general reasoning about this problem impossible.

So to make things short, the only valid answer is : try it with both and benchmark your code to see which is faster.

2) Under the hypothesis that you are not hitting any compiler/optimizer bug (they are rare, but they exist), then it is reasonable to assume that an error in your program that only reveals itself at -O3 only, then it has probably been there all the time, only the -O3 option uncovered it.

like image 32
Louen Avatar answered Dec 24 '22 01:12

Louen