Here's the function I'm looking at:
template <uint8_t Size>
inline uint64_t parseUnsigned( const char (&buf)[Size] )
{
uint64_t val = 0;
for (uint8_t i = 0; i < Size; ++i)
if (buf[i] != ' ')
val = (val * 10) + (buf[i] - '0');
return val;
}
I have a test harness which passes in all possible numbers with Size=5, left-padded with spaces. I'm using GCC 4.7.2. When I run the program under callgrind after compiling with -O3 I get:
I refs: 7,154,919
When I compile with -O2 I get:
I refs: 9,001,570
OK, so -O3 improves the performance (and I confirmed that some of the improvement comes from the above function, not just the test harness). But I don't want to completely switch from -O2 to -O3, I want to find out which specific option(s) to add. So I consult man g++
to get the list of options it says are added by -O3:
-fgcse-after-reload [enabled]
-finline-functions [enabled]
-fipa-cp-clone [enabled]
-fpredictive-commoning [enabled]
-ftree-loop-distribute-patterns [enabled]
-ftree-vectorize [enabled]
-funswitch-loops [enabled]
So I compile again with -O2 followed by all of the above options. But this gives me even worse performance than plain -O2:
I refs: 9,546,017
I discovered that adding -ftree-vectorize to -O2 is responsible for this performance degradation. But I can't figure out how to match the -O3 performance with any combination of options. How can I do this?
In case you want to try it yourself, here's the test harness (put the above parseUnsigned()
definition under the #includes):
#include <cmath>
#include <stdint.h>
#include <cstdio>
#include <cstdlib>
#include <cstring>
template <uint8_t Size>
inline void increment( char (&buf)[Size] )
{
for (uint8_t i = Size - 1; i < 255; --i)
{
if (buf[i] == ' ')
{
buf[i] = '1';
break;
}
++buf[i];
if (buf[i] > '9')
buf[i] -= 10;
else
break;
}
}
int main()
{
char str[5];
memset(str, ' ', sizeof(str));
unsigned max = std::pow(10, sizeof(str));
for (unsigned ii = 0; ii < max; ++ii)
{
uint64_t result = parseUnsigned(str);
if (result != ii)
{
printf("parseUnsigned(%*s) from %u: %lu\n", sizeof(str), str, ii, result);
abort();
}
increment(str);
}
}
A very similar question was already answered here: https://stackoverflow.com/a/6454659/483486
I've copied the relevant text underneath.
UPDATE: There are questions about it in GCC WIKI:
- "Is -O1 (-O2,-O3 or -Os) equivalent to individual -foptimization options?"
No. First, individual optimization options (-f*) do not enable optimization, an option -Os or -Ox with x > 0 is required. Second, the -Ox flags enable many optimizations that are not controlled by any individual -f* option. There are no plans to add individual options for controlling all these optimizations.
- "What specific flags are enabled by -O1 (-O2, -O3 or -Os)?"
Varies by platform and GCC version. You can get GCC to tell you what flags it enables by doing this:
touch empty.c gcc -O1 -S -fverbose-asm empty.c cat empty.s
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With