Is the addition <code>x + x</code> interchangeable by the multiplication <code>2 * x</code> in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that <code>case_add</code> and <code>case_mul</code> always give exactly the same result? <pre class="prettyprint"><code>#include <limits> template <typename T> T case_add(T x, size_t n) { static_assert(std::numeric_limits<T>::is_iec559, "invalid type"); T result(x); for (size_t i = 1; i < n; ++i) { result += x; } return result; } template <typename T> T case_mul(T x, size_t n) { static_assert(std::numeric_limits<T>::is_iec559, "invalid type"); return x * static_cast<T>(n); } </code></pre>

<blockquote> Is the addition <code>x + x</code> interchangeable by the multiplication <code>2 * x</code> in IEEE 754 (IEC 559) floating-point standard </blockquote> Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point). <blockquote> or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result? </blockquote> Not generally, no. From what I can tell, it seems to hold for <code>n <= 5</code>: <ul> <li> <code>n=3</code>: as <code>x+x</code> is exact (i.e. involves no rounding), so <code>(x+x)+x</code> only involves one rounding at the final step.</li> <li> <code>n=4</code> (and you're using the default rounding mode) then <ul> <li>if the last bit of <code>x</code> is 0, then <code>x+x+x</code> is exact, and so the results are equal by the same argument as <code>n=3</code>.</li> <li>if the last 2 bits are <code>01</code>, then the exact value of <code>x+x+x</code> will have last 2 bits of <code>1|1</code> (where | indicates the final bit in the format), which will be rounded up to <code>0|0</code>. The next addition will give an exact result <code>|01</code>, so the result will be rounded down, cancelling out the previous error.</li> <li>if the last 2 bits are <code>11</code>, then the exact value of <code>x+x+x</code> will have last 2 bits of <code>0|1</code>, which will be rounded down to <code>0|0</code>. The next addition will give an exact result <code>|11</code>, so the result will be rounded up, again cancelling out the previous error.</li> </ul> </li> <li><code>n=5</code> (again, assuming default rounding): since <code>x+x+x+x</code> is exact, it holds for the same reason as <code>n=3</code>.</li> </ul> For <code>n=6</code> it fails, e.g. take <code>x</code> to be <code>1.0000000000000002</code> (the next <code>double</code> after <code>1.0</code>), in which case <code>6x</code> is <code>6.000000000000002</code> and <code>x+x+x+x+x+x</code> is <code>6.000000000000001</code>

Interchangeability of IEEE 754 floating-point addition and multiplication

Tags:

c++

floating-point

ieee-754

numerical-stability

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard, or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

#include <limits>

template <typename T>
T case_add(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    T result(x);

    for (size_t i = 1; i < n; ++i)
    {
        result += x;
    }

    return result;
}

template <typename T>
T case_mul(T x, size_t n)
{
    static_assert(std::numeric_limits<T>::is_iec559, "invalid type");

    return x * static_cast<T>(n);
}

810

asked Oct 04 '16 15:10

plasmacel

1 Answers

Is the addition x + x interchangeable by the multiplication 2 * x in IEEE 754 (IEC 559) floating-point standard

Yes, since they are both mathematically identical, they will give the same result (since the result is exact in floating point).

or more generally speaking is there any guarantee that case_add and case_mul always give exactly the same result?

Not generally, no. From what I can tell, it seems to hold for n <= 5:

n=3: as x+x is exact (i.e. involves no rounding), so (x+x)+x only involves one rounding at the final step.
n=4 (and you're using the default rounding mode) then
- if the last bit of x is 0, then x+x+x is exact, and so the results are equal by the same argument as n=3.
- if the last 2 bits are 01, then the exact value of x+x+x will have last 2 bits of 1|1 (where | indicates the final bit in the format), which will be rounded up to 0|0. The next addition will give an exact result |01, so the result will be rounded down, cancelling out the previous error.
- if the last 2 bits are 11, then the exact value of x+x+x will have last 2 bits of 0|1, which will be rounded down to 0|0. The next addition will give an exact result |11, so the result will be rounded up, again cancelling out the previous error.
n=5 (again, assuming default rounding): since x+x+x+x is exact, it holds for the same reason as n=3.

For n=6 it fails, e.g. take x to be 1.0000000000000002 (the next double after 1.0), in which case 6x is 6.000000000000002 and x+x+x+x+x+x is 6.000000000000001

167

answered Sep 30 '22 07:09

Simon Byrne

Related questions
                            
                                Using auto in output parameter
                            
                                What is the difference between warpPerspective and perspectiveTransform?
                            
                                Why does my Arduino Class Constructor require an argument?
                            
                                Dynamically Find the Edge of a Rectangle
                            
                                How to sort two vectors simultaneously in c++ without using boost or creating templates?
                            
                                Variable name same as function name giving compiler error... Why?
                            
                                Does order of method declarations in a class matter to the compiler?
                            
                                I want to create something like a python dictionary in C++
                            
                                Arduino reading SD file line by line C++
                            
                                How to get Position, Width and Height of Mac OS X Dock? Cocoa/Carbon/C++/Qt
                            
                                What is the need for enable_shared_from_this? [duplicate]
                            
                                Do macros in C++ improve performance?
                            
                                Dlib webcam capture with face detection and shape prediction is slow
                            
                                Is it possible to write OpenCL kernels in C++ rather than C?
                            
                                Pass callable object by value, assign it to pointer member
                            
                                Given sorted vector find transition from negative to positive
                            
                                C++ member functions with same name and parameters, different return type
                            
                                how to rotate elements to the left in array?
                            
                                Qt - No matching function for call to ‘QVariant::QVariant(MyClass&)’
                            
                                Visual Studio 2015 - Can't see code to existing project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With