Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

g++ -O3 optimizes better than -O2 with all extra optimizations added [duplicate]

Here's the function I'm looking at:

template <uint8_t Size>
inline uint64_t parseUnsigned( const char (&buf)[Size] )
{
  uint64_t val = 0;
  for (uint8_t i = 0; i < Size; ++i)
    if (buf[i] != ' ')
      val = (val * 10) + (buf[i] - '0');
  return val;
}

I have a test harness which passes in all possible numbers with Size=5, left-padded with spaces. I'm using GCC 4.7.2. When I run the program under callgrind after compiling with -O3 I get:

I   refs:      7,154,919

When I compile with -O2 I get:

I   refs:      9,001,570

OK, so -O3 improves the performance (and I confirmed that some of the improvement comes from the above function, not just the test harness). But I don't want to completely switch from -O2 to -O3, I want to find out which specific option(s) to add. So I consult man g++ to get the list of options it says are added by -O3:

-fgcse-after-reload                         [enabled]
-finline-functions                          [enabled]
-fipa-cp-clone                              [enabled]
-fpredictive-commoning                      [enabled]
-ftree-loop-distribute-patterns             [enabled]
-ftree-vectorize                            [enabled]
-funswitch-loops                            [enabled]

So I compile again with -O2 followed by all of the above options. But this gives me even worse performance than plain -O2:

I   refs:      9,546,017

I discovered that adding -ftree-vectorize to -O2 is responsible for this performance degradation. But I can't figure out how to match the -O3 performance with any combination of options. How can I do this?

In case you want to try it yourself, here's the test harness (put the above parseUnsigned() definition under the #includes):

#include <cmath>
#include <stdint.h>
#include <cstdio>
#include <cstdlib>
#include <cstring>

template <uint8_t Size>
inline void increment( char (&buf)[Size] )
{
  for (uint8_t i = Size - 1; i < 255; --i)
  {
    if (buf[i] == ' ')
    {
      buf[i] = '1';
      break;
    }

    ++buf[i];
    if (buf[i] > '9')
      buf[i] -= 10;
    else
      break;
  }
}

int main()
{
  char str[5];
  memset(str, ' ', sizeof(str));

  unsigned max = std::pow(10, sizeof(str));
  for (unsigned ii = 0; ii < max; ++ii)
  {
    uint64_t result = parseUnsigned(str);
    if (result != ii)
    {
      printf("parseUnsigned(%*s) from %u: %lu\n", sizeof(str), str, ii, result);
      abort();
    }
    increment(str);
  }
}
like image 963
John Zwinck Avatar asked Aug 21 '14 02:08

John Zwinck


1 Answers

A very similar question was already answered here: https://stackoverflow.com/a/6454659/483486

I've copied the relevant text underneath.

UPDATE: There are questions about it in GCC WIKI:

  • "Is -O1 (-O2,-O3 or -Os) equivalent to individual -foptimization options?"

No. First, individual optimization options (-f*) do not enable optimization, an option -Os or -Ox with x > 0 is required. Second, the -Ox flags enable many optimizations that are not controlled by any individual -f* option. There are no plans to add individual options for controlling all these optimizations.

  • "What specific flags are enabled by -O1 (-O2, -O3 or -Os)?"

Varies by platform and GCC version. You can get GCC to tell you what flags it enables by doing this:

touch empty.c
gcc -O1 -S -fverbose-asm empty.c
cat empty.s
like image 93
OmnipotentEntity Avatar answered Oct 23 '22 07:10

OmnipotentEntity