Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I speed-up this loop (in C)?

I'm trying to parallelize a convolution function in C. Here's the original function which convolves two arrays of 64-bit floats:

void convolve(const Float64 *in1,
              UInt32 in1Len,
              const Float64 *in2,
              UInt32 in2Len,
              Float64 *results)
{
    UInt32 i, j;

    for (i = 0; i < in1Len; i++) {
        for (j = 0; j < in2Len; j++) {
            results[i+j] += in1[i] * in2[j];
        }
    }
}

In order to allow for concurrency (without semaphores), I created a function that computes the result for a particular position in the results array:

void convolveHelper(const Float64 *in1,
                    UInt32 in1Len,
                    const Float64 *in2,
                    UInt32 in2Len,
                    Float64 *result,
                    UInt32 outPosition)
{
    UInt32 i, j;

    for (i = 0; i < in1Len; i++) {
        if (i > outPosition)
            break;
        j = outPosition - i;
        if (j >= in2Len)
            continue;
        *result += in1[i] * in2[j];
    }
}

The problem is, using convolveHelper slows down the code about 3.5 times (when running on a single thread).

Any ideas on how I can speed-up convolveHelper, while maintaining thread safety?

like image 747
splicer Avatar asked Apr 18 '10 13:04

splicer


People also ask

Which loop is the fastest loop?

The traditional for loop is the fastest, so you should always use that right? Not so fast - performance is not the only thing that matters. Code Readability is usually more important, so default to the style that fits your application.

What is faster than a for loop?

List comprehensions are faster than for loops to create lists.

Which is faster for loop or while loop?

Using for: % Time elapsed: 0.0010001659 seconds. Using while: % Time elapsed: 0.026000023 seconds. The main reason that While is much slower is because the while loop checks the condition after each iteration, so if you are going to write this code, just use a for loop instead.


2 Answers

Convolutions in the time domain become multiplications in the Fourier domain. I suggest you grab a fast FFT library (like FFTW) and use that. You'll go from O(n^2) to O(n log n).

Algorithmic optimizations nearly always beat micro-optimizations.

like image 78
Thomas Avatar answered Oct 24 '22 13:10

Thomas


The most obvious thing that could help would be to pre-compute the starting and ending indices of the loop, and remove the extra tests on i and j (and their associated jumps). This:

for (i = 0; i < in1Len; i++) {
   if (i > outPosition)
     break;
   j = outPosition - i;
   if (j >= in2Len)
     continue;
   *result += in1[i] * in2[j];
}

could be rewritten as:

UInt32 start_i = (in2Len < outPosition) ? outPosition - in2Len + 1 : 0;
UInt32 end_i = (in1Len < outPosition) ? in1Len : outPosition + 1;

for (i = start_i; i < end_i; i++) {
   j = outPosition - i;
   *result += in1[i] * in2[j];
}

This way, the condition j >= in2Len is never true, and the loop test is essentially the combination of the tests i < in1Len and i < outPosition.

In theory you also could get rid of the assignment to j and turn i++ into ++i, but the compiler is probably doing those optimizations for you already.

like image 35
Tyler McHenry Avatar answered Oct 24 '22 13:10

Tyler McHenry