I have a C function that computes the values of 4 sines based on time elapsed. Using gprof, I figured that this function uses 100% (100.7% to be exact lol) of the CPU time.
void
update_sines(void)
{
clock_gettime(CLOCK_MONOTONIC, &spec);
s = spec.tv_sec;
ms = spec.tv_nsec * 0.0000001;
etime = concatenate((long)s, ms);
int k;
for (k = 0; k < 799; ++k)
{
double A1 = 145 * sin((RAND1 * k + etime) * 0.00333) + RAND5; // Amplitude
double A2 = 100 * sin((RAND2 * k + etime) * 0.00333) + RAND4; // Amplitude
double A3 = 168 * sin((RAND3 * k + etime) * 0.00333) + RAND3; // Amplitude
double A4 = 136 * sin((RAND4 * k + etime) * 0.00333) + RAND2; // Amplitude
double B1 = 3 + RAND1 + (sin((RAND5 * k) * etime) * 0.00216); // Period
double B2 = 3 + RAND2 + (sin((RAND4 * k) * etime) * 0.002); // Period
double B3 = 3 + RAND3 + (sin((RAND3 * k) * etime) * 0.00245); // Period
double B4 = 3 + RAND4 + (sin((RAND2 * k) * etime) * 0.002); // Period
double x = k; // Current x
double C1 = 0.6 * etime; // X axis move
double C2 = 0.9 * etime; // X axis move
double C3 = 1.2 * etime; // X axis move
double C4 = 0.8 * etime + 200; // X axis move
double D1 = RAND1 + sin(RAND1 * x * 0.00166) * 4; // Y axis move
double D2 = RAND2 + sin(RAND2 * x * 0.002) * 4; // Y axis move
double D3 = RAND3 + cos(RAND3 * x * 0.0025) * 4; // Y axis move
double D4 = RAND4 + sin(RAND4 * x * 0.002) * 4; // Y axis move
sine1[k] = A1 * sin((B1 * x + C1) * 0.0025) + D1;
sine2[k] = A2 * sin((B2 * x + C2) * 0.00333) + D2 + 100;
sine3[k] = A3 * cos((B3 * x + C3) * 0.002) + D3 + 50;
sine4[k] = A4 * sin((B4 * x + C4) * 0.00333) + D4 + 100;
}
}
And this is the output from gprof:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
100.07 0.04 0.04
I'm currently getting a frame rate of roughly 30-31 fps using this. Now I figure there as to be a more efficient way to do this.
As you noticed I already changed all the divisions to multiplications but that had very little effect on performance.
How could I increase the performance of this math heavy function?
Besides all the other advice given in other answers, here is a pure algorithmic optimization.
In most cases, you're computing something of the form sin(k * a + b)
, where a
and b
are constants, and k
is a loop variable. If you were also to compute cos(k * a + b)
, then you could use a 2D rotation matrix to form a recurrence relationship (in matrix form):
|cos(k*a + b)| = |cos(a) -sin(a)| * |cos((k-1)*a + b)|
|sin(k*a + b)| |sin(a) cos(a)| |sin((k-1)*a + b)|
In other words, you can calculate the value for the current iteration in terms of the value from the previous iteration. Thus, you only need to to do the full trig calculation for k == 0
, but the rest can be calculated via this recurrence (once you have calculated cos(a)
and sin(a)
, which are constants). So you eliminate 75% of the trig function calls (it's not clear the same trick can be pulled for the final set of trig calls).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With