Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why my C code is slowed down if I am using += operator

Edit I remove some part of the question because it might be confusing > and thanks of answers I should be able to found out by myself.

I am trying to write a program that resample a signal using a spectral function. I am writing the code in C but I discover the problem coding in C++.

Here is a portion of the code I am using. I have a function test() that call another function test3(tmp) passing an array as pointer. As long as I combine the += operator and I am trying to fill the array of the first function, my code is slowing down.

I initialize and calculate using zeros but all arrays must have double values.

#include <stdio.h>
#include <stdlib.h>

void test3(double *tmp);

void test() {
    double *tmp;
    int k;
    tmp = (double *) calloc(250 * 6, sizeof(double));
    for (k = 0; k < 1000; ++k)
        test3(tmp);
    free(tmp);
}

void test3(double *tmp) {
    int k;
    double *fx;
    double *srf;
    double *ptr;
    int p;
    int l;
    int c;
    double a;
    fx = (double *) calloc(2100 * 6, sizeof(double));
    srf = (double *) calloc(2100 * 250, sizeof(double));
    
    for (int k = 0; k < 2100 * 6; ++k)   fx[k] = 0;
    for (int k = 0; k < 2100 * 250; ++k) srf[k] = 0;
    
    ptr = (double *) calloc(250 * 6, sizeof(double));
    
    for (p = 0; p < 6; ++p) {
        for (l = 0; l < 250; ++l) {
            ptr[l + p*250] = 0.0;
            for (c = 0; c < 2100; ++c)
                ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100];
        }
    }
    for (k = 0; k < 250 * 6; ++k) tmp[k] = ptr[k];
    
    free(ptr);
    free(fx);
    free(srf);
}

int main(int argc, char *argv[]) {
    test();
    return 0;
}

I compile using:

gcc -O2 -o test test.c -lm

When I run the code time ./test I get

./test_lut  3.84s user 0.11s system 98% cpu 4.008 total
like image 976
Stéphane Guillaso Avatar asked Oct 16 '25 19:10

Stéphane Guillaso


1 Answers

The compiler does a good job at optimizing out useless code, but not always:

  • if you remove the line for (k = 0; k < 250*6; ++k) tp[k] = ptr[k]; you essentially remove all side effects of the function test3(): you initialize arrays pointed to by fx and srf and compute the entries in the one pointed to by ptr, then free these arrays. The compiler seems able to determine that no object is impacted that survives after returning from the function and removes the code altogether, resulting in almost no time spent in the function.

  • if you change += to =, the loop

              for (c = 0; c < 2100; ++c)
                  ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100];
    

    keep modifying the same location, hence only the last iteration is useful so the code is reduced to ptr[l+p*250] += fx[2099+p*2100] * srf[2099+l*2100]; resulting in much faster operation.

  • the third attempt using a temporary variable a has no impact as the compiler probably already optimizes the constant destination out of the loop.

You can investigate the compiler work using Godbolt's Compiler Explorer and play with the code and compiler flags.

As can be seen analyzing the code generated, neither gcc not clang generate any code for the initialization loops (setting memory allocated with calloc() to all bit zero value 0.0 has no effect) and convert the final for loop to a call to memcpy. Commenting this final for loop causes both compilers to generate no code and changing the += to = simplifies the inner loop code.

Interestingly, albeit both compilers correctly determine that fx and srf point to arrays with all elements equal to 0.0, they do not assume that the contents of these arrays stay null for the whole computation and thus all results in the destination array should be null too as ptr[l+p*250] += fx[c+p*2100] * srf[c+l*2100]; simplifies to ptr[l+p*250] += 0.0;

If you compile the code as C (not C++) and define fx, srf and ptr as

    double * restrict fx;
    double * restrict srf; 
    double * restrict ptr; 

Both compilers should determine that the arrays fx and srf cannot be modified as a side effect of writing via ptr, hence remain null. The code generated would be drastically simplified and should execute almost instantly, but this optimisation does not seem to be effective.

like image 136
chqrlie Avatar answered Oct 18 '25 10:10

chqrlie