Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing execution time of two functions?

Tags:

c

I wrote two function which do the same thing, but use different algorithms. I compare the execution time using clock() from time.h, but the result is inconsistent.

I have tried to change the sequence of execution of function, but it seems like the first function to be executed will always have the longer running time

#include <stdio.h>
#include <time.h>

long exponent(int a, int b);
long exponentFast(int a, int b);


int main(void)
{
    int base;
    int power;
    clock_t begin;
    clock_t end;
    double time_spent;

    base = 2;
    power = 25;

    begin = clock();
    long result1 = exponentFast(base, power);
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Result1: %li, Time: %.9f\n", result1, time_spent);

    begin = clock();
    long result2 = exponent(base, power);
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("Result2: %li, Time: %.9f\n", result2, time_spent);
}

long exponent(int a, int b)
{
    ...
}

long exponentFast(int a, int b)
{
    ...
}

I expect time_spent for result1 to have a lower value than that for result2, but the output is

Result1: 33554432, Time: 0.000002000
Result2: 33554432, Time: 0.000001000

Executing exponent() before exponentFast() also yield the same result, suggesting that my implementation of benchmarking is wrong.

like image 698
JF9199 Avatar asked Jan 01 '23 17:01

JF9199


1 Answers

It can be surprisingly difficult to perform timings on function calls like these accurately and significantly. Here's a modification of your program that illustrates the difficulties:

#include <stdio.h>
#include <time.h>
#include <math.h>

long exponent(int a, int b);
long exponentFast(int a, int b);

void tester(long (*)(int, int));

#define NTRIALS 1000000000

int main(void)
{
    clock_t begin;
    clock_t end;
    double time_spent;

    begin = clock();
    tester(exponentFast);
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("exponentFast: Time: %.9f = %.10f/call\n", time_spent, time_spent / NTRIALS);

    begin = clock();
    tester(exponent);
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("exponent: Time: %.9f = %.10f/call\n", time_spent, time_spent / NTRIALS);
}

void tester(long (*func)(int, int))
{
    int base = 2;
    int power = 25;
    int i;
    unsigned long accum = 0;
    for(i = 0; i < NTRIALS; i++) {
        accum += (*func)(base, power);
        base = (base + 1) % 5;
        power = (power + 7) % 16;
    }
    printf("(accum = %lu)\n", accum);
}

long exponent(int a, int b)
{
    return pow(a, b);
}

long exponentFast(int a, int b)
{
    long ret = 1;
    int i;
    for(i = 0; i < b; i++)
        ret *= a;
    return ret;
}

You will notice that:

  • I've arranged to perform multiple trials, which involved adding a new function tester() to do this. (tester() therefore takes a pointer to the function being tested, which is a technique you may not have been familiar with yet.)
  • I've arranged to vary the arguments to the function under test, between calls.
  • I've arranged to do something with the return values of the functions under test, namely add them all up.

(The second and third bullets follow suggestions by Jonathan Leffler, and are intended to ensure that a too-clever compiler doesn't optimize out some or all of the interesting work.)

When I ran this on my computer (an ordinary consumer-grade laptop), these were the results I got:

(accum = 18165558496053920)
exponentFast: Time: 20.954286000 = 0.0000000210/call
(accum = 18165558496053920)
exponent: Time: 23.409001000 = 0.0000000234/call

There are two huge things to notice here.

  1. I ran each function one billion times. That's right, a billion, a thousand million. And yet this only took about 20 seconds.
  2. Even with that many trials, there's still almost no visible difference between the "regular" and "fast" versions. On average, one took 21 nanoseconds (nanoseconds!); the other took 23 nanoseconds. Big whoop.

(Actually, however, this first trial was significantly misleading. More on that in a bit.)

Before continuing, it's worth asking a couple of questions.

  1. Do these results even make sense? Is it actually possible for these functions to be taking mere tens of nanoseconds to do their work?
  2. Presuming they're accurate, do these results tell us that we're wasting our time, that the difference between the "regular" and "fast" versions is so slight as to not be worth the effort to write and debug the "fast" one?

To the first point, let's do a quick back-of-the-envelope analysis. My machine claims to have a 2.2 GHz CPU. That means (roughly speaking) that it can do 2.2 billion things per second, or about 0.45 nanoseconds per thing. So a function that's taking 21 nanoseconds can be doing roughly 21 ÷ 0.45 = 46 things. And since my sample exponentFast function is doing roughly as many multiplications as the value of the exponent, it looks like we're probably in the right ballpark.

The other thing I did to confirm that I was getting at least quasi-reasonable results was to vary the number of trials. With NTRIALS reduced to 100000000, the overall program took just about exactly a tenth of the time to run, meaning that the time per call was consistent.

Now, to point 2, I still remember one of my formative experiences as a programmer, when I wrote a new and improved version of a standard function which I just knew was going to be gobs faster, and after several hours spent debugging it to get it to work at all, I discovered that it wasn't measurably faster, until I increased the number of trials up into the millions, and the penny (as they say) dropped.

But as I said, the results I've presented so far were, by a funny coincidence, misleading. When I first threw together some simple code to vary the arguments given to the function calls, as shown above, I had:

int base = 2;
int power = 25;

and then, within the loop

    base = (base + 1) % 5;
    power = (power + 7) % 16;

This was intended to allow base to run from 0 to 4, and power from 0 to 15, with the numbers chosen to ensure that the result wouldn't overflow even when base was 4. But this means that power was, on average, only 8, meaning that my simpleminded exponentFast call was only having to make 8 trips through its loop, on average, not 25 as in your original post.

When I changed the iteration step to

power = 25 + (power - 25 + 1) % 5;

-- that is, not varying base (and therefore allowing it to remain as the constant 2) and allowing power to vary between 25 and 30, now the time per call for exponentFast rose to about 63 nanoseconds. The good news is that this makes sense (roughly three times as many iterations, on average, made it about three times slower), but the bad news is that it looks like my "exponentFast" function isn't very fast! (Obviously, though, I didn't expect it to be, with its simpleminded, brute-force loop. If I wanted to make it faster, the first thing I'd do, at little additional cost in complexity, would be to apply "binary exponentiation".)

There's at least one more thing to worry about, though, which is that if we call these functions one billion times, we're not only counting one billion times the amount of time each function takes to do its work, but also one billion times the function call overhead. If the function call overhead is on a par with the amount of work the function is doing, we will (a) have a hard time measuring the actual work time, but also (b) have a hard time speeding things up! (We could get rid of the function call overhead by inlining the functions for our test, but that obviously wouldn't be meaningful if the actual use of the functions in the end program were going to involve real function calls.)

And then yet one more inaccuracy is that we're introducing a timing artifact by computing new and different values of base and/or power for each call, and adding up all the results, so that the amortized time to do that work goes into what we've been calling "time per call". (This problem, at least, since it affects either exponentiation function equally, won't detract from our ability to assess which one, if either, is faster.)


Addendum: Since my initial exponent of "exponentFast" really was pretty embarrassingly simpleminded, and since binary exponentiation is so easy and elegant, I performed one more test, rewriting exponentFast as

long exponentFast(int a, int b)
{
    long ret = 1;
    long fac = a;

    while(1) {
        if(b & 1) ret *= fac;
        b >>= 1;
        if(b == 0) break;
        fac *= fac;
    }
    return ret;
}

Now -- Hooray! -- the average call to exponentFast goes down to about 16 ns per call on my machine. But the "Hooray!" is qualified. It's evidently about 25% faster than calling pow(), and that's nicely significant, but not an order of magnitude or anything. If the program where I'm using this is spending all its time exponentiating, I'll have made that program 25% faster, too, but if not, the improvement will be less. And there are cases where the improvement (the time saved over all anticipated runs of the program) will be less than the time I spent writing and testing my own version. And I haven't yet spent any time doing proper regression tests on my improved exponentFast function, but if this were anything other than a Stack Overflow post, I'd have to. It's got several sets of edge cases, and might well contain lurking bugs.

like image 103
Steve Summit Avatar answered Jan 13 '23 16:01

Steve Summit