Optimizing function call from for loop

I have some simple functions

int f_0(int);
int f_1(int);
...
int f_n(int);

and then I have some for loops in which I call f_i(), the condition in this loops doesnt have to be the same

for (int i = 0; i < n; i++) {
   ...
   if (condition) {
      int myInt = f_i(); // this is not real implementation but shows the result
                         // I want to achieve
      ... //edit
   }
...
}

Here are the ways I tried to implement this:

Breaking down the for loop and calling each function in correspondning part. This results in fastest code but this is higly unelegant and such code is hard to be further developed.
Pointers to functions

typedef int (*Foo) (int);

Foo fptr[] = { f_0, f_1, ... , f_n };

this is elegant method but in my case it is 4.4 slower than breaking down the loop. Constant pointers to functions yield simmilar results.

Encapsulating my functions into switch function. This was 2.6 slower than breaking down the loop.

Is there any better way how to implement this? Ideal solution would be the one with compact code but the compiler would break down the loop and let the calculations be the fastest.

I´m using MSVC 2012 and running on release mode with optimizations set to maximize speed.

Edit:

Here is my testing code:

head.h

namespace c {
const int w = 1024;
const int A = w * w;
}

inline int f_0(int pos)  { return (pos - c::w + c::A) % c::A;           }
inline int f_1(int pos)  { return (pos + 1 - c::w + c::A) % c::A;       }
inline int f_2(int pos)  { return (pos + 1) % c::A;                     }
inline int f_3(int pos)  { return (pos + c::w) % c::A;                  }
inline int f_4(int pos)  { return (pos - 1 + c::w) % c::A;              }
inline int f_5(int pos)  { return (pos - 1 + c::A) % c::A;              }

typedef int (*NEIGH_F) (int);
typedef int (* const CNEIGH_F) (int);

const NEIGH_F  fptr[]  = { f_0, f_1, f_2, f_3, f_4, f_5 };
const CNEIGH_F cfptr[] = { f_0, f_1, f_2, f_3, f_4, f_5 };

inline int fswitch(int i, int pos) {
    switch(i) {
    case 0 : return f_0(pos); break;
    case 1 : return f_1(pos); break;
    case 2 : return f_2(pos); break;
    case 3 : return f_3(pos); break;
    case 4 : return f_4(pos); break;
    case 5 : return f_5(pos); break;
    default : return -1; break;
    }
}

main.cpp

#include "head.h"
#include <iostream>
#include <time.h>

int main()
{
    int maxRepeat = 100;

    clock_t startTime = clock();
    double sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            sum += f_0(i);
            sum += f_1(i);
            sum += f_2(i);
            sum += f_3(i);
            sum += f_4(i);
            sum += f_5(i);
        }
    std::cout << "normal time:        " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += fptr[j](i);
        }
    std::cout << "pointer time:       " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += cfptr[j](i);
        }
    std::cout << "const pointer time: " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;

    startTime = clock();
    sum = 0;
    for (int repeat = 0; repeat < maxRepeat; repeat++)
        for (int i = 0; i < c::A; i++) {
            for (int j = 0; j < 6; j++)
                sum += fswitch(j, i);
        }
    std::cout << "switch time:        " << (clock() - startTime)/(double)CLOCKS_PER_SEC
                 << "  sum is: " << sum << std::endl;
    std::cin.ignore();

    return 0;
}

functions f_i are the functions I use in my real implementation, but the loops here are much simpler due to testing purposes in real implementation there are several different loops of form shown in second code snippet in the question.

Edit2:

The form of my loop should stay the same I just want to find the best way how to put f_i into my loops.

How do you optimize a loop in Python?

We can optimize loops by vectorizing operations. This is one/two orders of magnitude faster than their pure Python equivalents (especially in numerical computations). Vectorization is something we can get with NumPy. Numpy is a library with efficient data structures designed to hold matrix data.

you can use template function instead of f_0, f_1... nicer to maintain.

template <int N>
void f();

template <>
void f<0>()
{
    printf("f<0>");
}

template <>
void f<1>()
{
    printf("f<1>");
}

int main() {
    f<0>();
    f<1>();
    //f<2>(); // this is compile error
    return 0;
}

however, the template argument must be provided as compile-time constant, so you can't call function like int i = 0; f<i>()

to workaround this, you can use switch-case to call function, not very pretty, but works

void call_f(int i)
{
    switch(i)
    {
        case 0:
            f<0>();
            break;
        case 1:
            f<1>();
            break;
        default:
            // invalid i, report error
            break;
    }
}

however, there is no compile-time check to i

put all together

Optimizing function call from for loop

Tags:

c++

function

optimization

for-loop

Adam Sikora

People also ask

1 Answers

Bryan Chen

Recent Activity

Donate For Us

Optimizing function call from for loop

Tags:

c++

function

optimization

for-loop

Adam Sikora

People also ask

1 Answers

Bryan Chen

Related questions

Recent Activity

Donate For Us