Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize a decision in a CFD C code

I want to optimize the use of different functions in a CFD code, which the user can choose at runtime via a config file which is read by the program.

I came up with a minimal working example, where there are two separate functions with one input. One squares the input and one cubes it. Via command line option the user can choose which function to use. The code the squares/cubes a bunch of numbers (it calculates the integral from 0 to 1 of either x^2 or x^3, depending which function was chosen) in a for loop and outputs the results. The first variant is just a switch case in a for loop (case1). The second thing I tries was a function pointer, which is set before the loop (case2). The third thing I did was selectively compiling only the function that the user intends to use with the use of preprocessor commands (case3).

case1:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

double f_square(double x) {return x * x;}

double f_cube(double x) {return x * x * x;}

int main(int argc, char *argv[])
{
    double x;
    double sum = 0;
    double del_x = 4e-10;

    printf("Speed test -- no optimisation\n");

    clock_t startClock = clock();
    for (x = 0; x < 1; x += del_x) {
        switch (argv[1][0]) {
        case '2':
            sum += f_square(x) * del_x;
            break;
        case '3':
            sum += f_cube(x) * del_x;
            break;
        default:
            printf("Invalid choice! Abort\n");
            exit(1);
        }
    }
    clock_t endClock = clock();

    printf("Int_{0}^{1} x^%c: %.8g\n", argv[1][0], sum);
    printf("Execution time: %.6f\n", (endClock - startClock) / (double)CLOCKS_PER_SEC);
}

case2:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

double f_square(double x) {return x * x;}

double f_cube(double x) {return x * x * x;}

int main(int argc, char *argv[])
{
    double x;
    double sum = 0;
    double del_x = 4e-10;
    double (*f)(double);

    printf("Speed test -- function pointers\n");

    switch (argv[1][0]) {
    case '2':
        f = &f_square;
        break;
    case '3':
        f = &f_cube;
        break;
    default:
        printf("Invalid choice! Abort\n");
        exit(1);
    }

    clock_t startClock = clock();
    for (x = 0; x < 1; x += del_x) {
        sum += f(x) * del_x;
    }
    clock_t endClock = clock();

    printf("Int_{0}^{1} x^%c: %.8g\n", argv[1][0], sum);
    printf("Execution time: %.6f\n", (endClock - startClock) / (double)CLOCKS_PER_SEC);
}

case3:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#ifdef SQUARE
double f(double x) {return x * x;}
#endif

#ifdef CUBE
double f(double x) {return x * x * x;}
#endif

int main(void)
{
    double x;
    double sum = 0;
    double del_x = 4e-10;

    printf("Speed test -- selective compilation\n");

    clock_t startClock = clock();
    for (x = 0; x < 1; x += del_x) {
        sum += f(x) * del_x;
    }
    clock_t endClock = clock();

    #ifdef SQUARE
    printf("Int_{0}^{1} x^2: %.8g\n", sum);
    #endif
    #ifdef CUBE
    printf("Int_{0}^{1} x^3: %.8g\n", sum);
    #endif
    printf("Execution time: %.6f\n", (endClock - startClock) / (double)CLOCKS_PER_SEC);
}

When measuring the execution times I found something odd:

  • with O0 I got the distribution I expected, case3 was the fastest, followed by case2 and then case1
  • with O1--O3 case2 always performed significantly worse than case1 and case3

Here are some images comparing the execution times

  • Compiled with O0
  • Compiled with O1
  • Compiled with O3.

This confuses me and I would like to know what I can do in order to use function pointers without losing performance, because for flexibility reasons I really want to use function pointers.

=> Why are function pointers so slow?

I want to add that I am not a software engineer but rather an aerospace engineering student and sadly we do not get a lot of programming lessons, so every little detail might be helpful.

like image 872
koipond Avatar asked May 15 '26 20:05

koipond


1 Answers

Here is a disassembly view of two implementations of similar functions: https://c.godbolt.org/z/l24Zhl

Note that with -O2, the first method inlines the calls to f_cube and f_square (note no calls to the functions in the assembly), but the second version does not.

Most likely, the first version is then further sped-up due to Branch Prediction on the processor.

Have you profiled your code and found that this area is a bottleneck? Remember that you make the greatest speed gains by optimizing the most-used code first. Remember: first make it work, then make it fast.

like image 174
Paul Belanger Avatar answered May 18 '26 08:05

Paul Belanger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!