Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bad optimization of std::fabs()?

Recently i was working with an application that had code similar to:

for (auto x = 0; x < width - 1 - left; ++x)
{
    // store / reset points
    temp = hPoint = 0;
    for(int channel = 0; channel < audioData.size(); channel++)
    {
        if (peakmode) /* fir rms of window size */
        {
            for (int z = 0; z < sizeFactor; z++)
            {
                temp += audioData[channel][x * sizeFactor + z + offset];
            }
            hPoint += temp / sizeFactor;
        }
        else /* highest sample in window */
        {
            for (int z = 0; z < sizeFactor; z++)
            {
                temp = audioData[channel][x * sizeFactor + z + offset];
                if (std::fabs(temp) > std::fabs(hPoint))
                hPoint = temp;
            }
        }
        .. some other code
    }
    ... some more code
}

This is inside a graphical render loop, called some 50-100 times / sec with buffers up to 192kHz in multiple channels. So it's a lot of data running through the innermost loops, and profiling showed this was a hotspot.

It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries. It looked something like this:

if ((const float &&)(*((int *)&temp) & ~0x80000000) > (const float &&)(*((int *)&hPoint) & ~0x80000000))
    hPoint = temp;

This gave a 12x reduction in render time, while still producing the same, valid output. Note that everything in the audiodata is sanitized beforehand to not include nans/infs/denormals, and only have a range of [-1, 1].

Are there any corner cases where this optimization will give wrong results - or, why is the standard library function not implemented like this? I presume it has to do with handling of non-normal values?

e: the layout of the floating point model is conforming to ieee, and sizeof(float) == sizeof(int) == 4

like image 722
Shaggi Avatar asked May 06 '14 09:05

Shaggi


1 Answers

Well, you set the floating-point mode to IEEE conforming. Typically, with switches like --fast-math the compiler can ignore IEEE corner cases like NaN, INF and denormals. If the compiler also uses intrinsics, it can probably emit the same code.

BTW, if you're going to assume IEEE format, there's no need for the cast back to float prior to the comparison. The IEEE format is nifty: for all positive finite values, a<b if and only if reinterpret_cast<int_type>(a) < reinterpret_cast<int_type>(b)

like image 114
MSalters Avatar answered Oct 02 '22 01:10

MSalters