Is the addition x + x
interchangeable by the multiplication 2 * x
in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that case_add
and case_mul
always give exactly the same result?
#include <limits>
template <typename T>
T case_add(T x, size_t n)
{
static_assert(std::numeric_limits<T>::is_iec559, "invalid type");
T result(x);
for (size_t i = 1; i < n; ++i)
{
result += x;
}
return result;
}
template <typename T>
T case_mul(T x, size_t n)
{
static_assert(std::numeric_limits<T>::is_iec559, "invalid type");
return x * static_cast<T>(n);
}
The IEEE 754 standard specifies two precisions for floating-point numbers. Single precision numbers have 32 bits − 1 for the sign, 8 for the exponent, and 23 for the significand. The significand also includes an implied 1 to the left of its radix point.
An integer multiplication always needs a "carry propagate add" step at the end. Consequently, addition is always faster because that's the final step of a multiplication. (Floating point is a little different, but not significantly so).
Is the addition
x + x
interchangeable by the multiplication2 * x
in IEEE 754 (IEC 559) floating-point standard
Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point).
or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?
Not generally, no. From what I can tell, it seems to hold for n <= 5
:
n=3
: as x+x
is exact (i.e. involves no rounding), so (x+x)+x
only involves one rounding at the final step.n=4
(and you're using the default rounding mode) then
x
is 0, then x+x+x
is exact, and so the results are equal by the same argument as n=3
.01
, then the exact value of x+x+x
will have last 2 bits of 1|1
(where | indicates the final bit in the format), which will be rounded up to 0|0
. The next addition will give an exact result |01
, so the result will be rounded down, cancelling out the previous error.11
, then the exact value of x+x+x
will have last 2 bits of 0|1
, which will be rounded down to 0|0
. The next addition will give an exact result |11
, so the result will be rounded up, again cancelling out the previous error.n=5
(again, assuming default rounding): since x+x+x+x
is exact, it holds for the same reason as n=3
.
For n=6
it fails, e.g. take x
to be 1.0000000000000002
(the next double
after 1.0
), in which case 6x
is 6.000000000000002
and x+x+x+x+x+x
is 6.000000000000001
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With