Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interchangeability of IEEE 754 floating-point addition and multiplication

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

#include <limits>

template <typename T>
T case_add(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    T result(x);

    for (size_t i = 1; i < n; ++i)
    {
        result += x;
    }

    return result;
}

template <typename T>
T case_mul(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    return x * static_cast<T>(n);
}
like image 810
plasmacel Avatar asked Oct 04 '16 15:10

plasmacel


People also ask

How floating-point numbers are represented using the IEEE standard 754?

The IEEE 754 standard specifies two precisions for floating-point numbers. Single precision numbers have 32 bits − 1 for the sign, 8 for the exponent, and 23 for the significand. The significand also includes an implied 1 to the left of its radix point.

Is floating-point multiplication faster than addition?

An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication. (Floating point is a little different, but not significantly so).


1 Answers

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard

Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point).

or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

Not generally, no. From what I can tell, it seems to hold for n <= 5:

  • n=3: as x+x is exact (i.e. involves no rounding), so (x+x)+x only involves one rounding at the final step.
  • n=4 (and you're using the default rounding mode) then

    • if the last bit of x is 0, then x+x+x is exact, and so the results are equal by the same argument as n=3.
    • if the last 2 bits are 01, then the exact value of x+x+x will have last 2 bits of 1|1 (where | indicates the final bit in the format), which will be rounded up to 0|0. The next addition will give an exact result |01, so the result will be rounded down, cancelling out the previous error.
    • if the last 2 bits are 11, then the exact value of x+x+x will have last 2 bits of 0|1, which will be rounded down to 0|0. The next addition will give an exact result |11, so the result will be rounded up, again cancelling out the previous error.
  • n=5 (again, assuming default rounding): since x+x+x+x is exact, it holds for the same reason as n=3.

For n=6 it fails, e.g. take x to be 1.0000000000000002 (the next double after 1.0), in which case 6x is 6.000000000000002 and x+x+x+x+x+x is 6.000000000000001

like image 167
Simon Byrne Avatar answered Sep 30 '22 07:09

Simon Byrne