Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimization Headache - removing if's from Look Up Table

Tags:

c++

arrays

lookup

I'm trying to optimize the following piece of code, which is a bottleneck in my application. What it does: It takes the double values value1 and value2 and tries to find the maximum including a correctional factor. If the difference between both values is greater than 5.0 (the LUT is scaled by factor 10), I can just take the max value of those two. If the difference is smaller than 5.0, I can use the correctional factor from the LUT.

Does anyone have an idea what could be a better style for this piece of code? I don't know where I'm losing time - is it the large number of ifs or the multiplication by 10?

double value1, value2;
// Lookup Table scaled by 10 for (ln(1+exp(-abs(x)))), which is almost 0 for x > 5 and symmetrical around 0. LUT[0] is x=0.0, LUT[40] is x=4.0.
const logValue LUT[50] = { ... }

if (value1 > value2)
{
    if (value1 - value2 >= 5.0)
    {
        return value1;
    }
    else
    {
        return value1 + LUT[(uint8)((value1 - value2) * 10)];
    }
}
else
{
    if (value2 - value1 >= 5.0)
    {
        return value2;
    }
    else
    {
        return value2 + LUT[(uint8)((value2 - value1) * 10)];
    }
}
like image 581
vls Avatar asked Sep 12 '11 13:09

vls


2 Answers

A couple of minutes of playing with Excel produces an approximation equation that looks like it might have the accuracy you need, so you can do away with the lookup table altogether. You still need one condition to make sure the parameters to the equation remain within the range that it was optimized for.

double diff = abs(value1 - value2);
double dmax = (value1 + value2 + diff) * 0.5; // same as (min+max+(max-min))/2
if (diff > 5.0)
    return dmax;
return dmax + 4.473865638/(2.611112371+diff) + 0.088190879*diff + -1.015046114;

P.S. I'm not guaranteeing this is faster, only that it's a different enough approach to be worth benchmarking.

P.P.S. It's possible to change the constraints to come up with slightly different constants, there are a lot of variations. Here's another set I did where the difference between your table and the formula will always be less than 0.008, also each value will be less than the one preceeding.

return dmax + 3.441318133/(2.296924445+diff) + 0.065529678*diff + -0.797081529;

Edit: I tested this code (second formula) with 100 passes against a million random numbers between 0 and 10, along with the original code from the question, MSalters currently accepted answer, and a brute force implementation max(value1,value2)+log(1.0+exp(-abs(value1-value2))). I tried it on a dual-core AMD Athlon and an Intel quad-core i7 and the results were roughly consistent. Here's a typical run:

  • Original: 1.32 seconds.
  • MSalters: 1.13 seconds.
  • Mine: 0.67 seconds.
  • Brute force: 4.50 seconds.

Processors have gotten unbelievably fast over the years, so fast now that they can do a couple of floating point multiplies and divides faster than they can look up a value in memory. Not only is this approach faster on a modern x86, it's also more accurate; the approximation errors in the equation are much less than the step errors caused by truncating the input for the lookup.

Naturally results can still vary based on your processor and compiler; benchmarking is still required for your own particular target.

like image 173
Mark Ransom Avatar answered Oct 11 '22 13:10

Mark Ransom


It probably goes down both paths equally enough that you're causing a lot of pipe-lining problems for your processor.

Have you tried profiling?

I'd also suggest trying to use the standard library and see if that helps (for example if it's able to use and processor-specific instructions):

double diff = std::fabs(value1 - value2);
double maxv = std::max(value1, value2);
return (diff >= 5.0) ? maxv : maxv + LUT[(uint8)((diff) * 10)];
like image 26
Mark B Avatar answered Oct 11 '22 14:10

Mark B