Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it realistic to use -O3 or -Ofast to compile your benchmark code or will it remove code?

When compiling the benchmark code below with -O3 I was impressed by the difference it made in latency so i began to wonder whether the compiler is not "cheating" by removing code somehow. Is there a way to check for that? Am I safe to benchmark with -O3? Is it realistic to expect 15x gains in speed?

Results without -O3: Average: 239 nanos Min: 230 nanos (9 million iterations)
Results with-O3: Average: 14 nanos, Min: 12 nanos (9 million iterations)

int iterations = stoi(argv[1]);
int load = stoi(argv[2]);

long long x = 0;

for(int i = 0; i < iterations; i++) {

    long start = get_nano_ts(); // START clock

    for(int j = 0; j < load; j++) {
        if (i % 4 == 0) {
            x += (i % 4) * (i % 8);
        } else {
            x -= (i % 16) * (i % 32);
        }
    }

    long end = get_nano_ts(); // STOP clock

    // (omitted for clarity)
}

cout << "My result: " << x << endl;

Note: I am using clock_gettime to measure:

long get_nano_ts() {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000000000 + ts.tv_nsec;
}
like image 677
LatencyGuy Avatar asked Oct 20 '22 08:10

LatencyGuy


1 Answers

The compiler will certainly be "cheating" and removing unnecessary code when compiling with optimization enabled. It actually goes great length to speed up your code which almost always will lead to impressive speed-ups. If it was somehow able to derive a formula that calculates the result in constant time instead of using this loop, it would. A constant factor 15 is nothing out of the ordinary.

But this does not mean that you should profile un-optimized builds! Indeed, when using languages like C and C++, the performance of un-optimized builds is pretty much completely meaningless. You need not worry about that at all.

Of course, this can interfere with micro-benchmarks as the one you showed above. Two points to that:

  1. More often than not, such micro optimization do not matter either. Prefer profiling your actual program and then removing bottlenecks.
  2. If you actually want such a micro benchmark, make it depend on some runtime input and display the result. That way, the compiler cannot remove the functionality itself, just make it reasonably fast.

Since you seem to be doing that, the code you show has a good chance of being a reasonable micro benchmark. One thing you should watch out for is whether your compiler moves both calls to get_nano_ts(); to the same side of the loop. It is allowed to do this since "run time" does not count as observable side effect. (The standard does not even mandate your machine operating at finite speed.) It was argued here that this usually is not a problem, though I cannot really judge whether the answer given is valid or not.

If your program does not do anything expensive other then the thing you want to benchmark (which it, if possible, should not do anyways), you can also move the time measurement "outside", e.g. with time.

like image 128
Baum mit Augen Avatar answered Nov 14 '22 22:11

Baum mit Augen