Bad optimization of std::fabs()?

Question

Recently i was working with an application that had code similar to:

for (auto x = 0; x < width - 1 - left; ++x)
{
    // store / reset points
    temp = hPoint = 0;
    for(int channel = 0; channel < audioData.size(); channel++)
    {
        if (peakmode) /* fir rms of window size */
        {
            for (int z = 0; z < sizeFactor; z++)
            {
                temp += audioData[channel][x * sizeFactor + z + offset];
            }
            hPoint += temp / sizeFactor;
        }
        else /* highest sample in window */
        {
            for (int z = 0; z < sizeFactor; z++)
            {
                temp = audioData[channel][x * sizeFactor + z + offset];
                if (std::fabs(temp) > std::fabs(hPoint))
                hPoint = temp;
            }
        }
        .. some other code
    }
    ... some more code
}

This is inside a graphical render loop, called some 50-100 times / sec with buffers up to 192kHz in multiple channels. So it's a lot of data running through the innermost loops, and profiling showed this was a hotspot.

It occurred to me that one could cast the float to an integer and erase the sign bit, and cast it back using only temporaries. It looked something like this:

if ((const float &&)(*((int *)&temp) & ~0x80000000) > (const float &&)(*((int *)&hPoint) & ~0x80000000))
    hPoint = temp;

This gave a 12x reduction in render time, while still producing the same, valid output. Note that everything in the audiodata is sanitized beforehand to not include nans/infs/denormals, and only have a range of [-1, 1].

Are there any corner cases where this optimization will give wrong results - or, why is the standard library function not implemented like this? I presume it has to do with handling of non-normal values?

e: the layout of the floating point model is conforming to ieee, and sizeof(float) == sizeof(int) == 4

MSalters · Accepted Answer

Well, you set the floating-point mode to IEEE conforming. Typically, with switches like --fast-math the compiler can ignore IEEE corner cases like NaN, INF and denormals. If the compiler also uses intrinsics, it can probably emit the same code.

BTW, if you're going to assume IEEE format, there's no need for the cast back to float prior to the comparison. The IEEE format is nifty: for ~~all~~ positive finite values, a<b if and only if reinterpret_cast<int_type>(a) < reinterpret_cast<int_type>(b)

Bad optimization of std::fabs()?

Tags:

c++

bit-manipulation

Shaggi

1 Answers

MSalters

Recent Activity

Donate For Us

Bad optimization of std::fabs()?

Tags:

c++

bit-manipulation

Shaggi

1 Answers

MSalters

Related questions

Recent Activity

Donate For Us