Reliable overflow detection of floating-point/integer type conversion

Question

Is there a safe way to reliably determine if an integral type T can store a floating-point integer value f (so f == floor(f)) without any overflow?

Keep in mind that there is no guarantee that the floating point type F is IEC 559 (IEEE 754) compatible, and that signed integer overflow is undefined behavior in C++. I'm interested in a solution which is correct according to the current C++ (C++17 at the writing) standard and avoids undefined behavior.

The following naive approach is not reliable, since there is no guarantee that type F can represent std::numeric_limits<I>::max() due to floating-point rounding.

#include <cmath>
#include <limits>
#include <type_traits>

template <typename I, typename F>
bool is_safe_conversion(F x)
{
    static_assert(std::is_floating_point_v<F>);
    static_assert(std::is_integral_v<I>);

    // 'fmax' may have a different value than expected
    static constexpr F fmax = static_cast<F>(std::numeric_limits<I>::max());

    return std::abs(x) <= fmax; // this test may gives incorrect results
}

Any idea?

chux - Reinstate Monica · Accepted Answer

Is there a safe way to reliably determine if an integral type T can store a floating-point integer value f?

Yes. The key is to test if f is in the range T::MIN - 0.999... to T::MAX + 0.999... using floating point math - with no rounding issues. Bonus: rounding mode does not apply.

There are 3 failure paths: too big, too small, not-a-number.

The below assumes int/double. I'll leave the C++ template forming for OP.

Forming exact T::MAX + 1 exactly using floating point math is easy as INT_MAX is a Mersenne Number. (We are not talking about Mersenne Prime here.)

Code takes advantage of:
A Mersenne Number divided by 2 with integer math is also a Mersenne Number.
The conversion of a integer type power-of-2 constant to a floating point type can be certain to be exact.

#define DBL_INT_MAXP1 (2.0*(INT_MAX/2+1)) 
// Below needed when -INT_MAX == INT_MIN
#define DBL_INT_MINM1 (2.0*(INT_MIN/2-1))

Forming exact T::MIN - 1 is hard as its absolute value is usually a power-of-2 + 1 and the relative precision of the integer type and the FP type are not certain. Instead code can subtract the exact power of 2 and compare to -1.

int double_to_int(double x) {
  if (x < DBL_INT_MAXP1) {
    #if -INT_MAX == INT_MIN
    // rare non-2's complement machine 
    if (x > DBL_INT_MINM1) {
      return (int) x;
    }
    #else
    if (x - INT_MIN > -1.0) {
      return (int) x;
    }
    #endif 
    Handle_Underflow();
  } else if (x > 0) {
    Handle_Overflow();
  } else {
    Handle_NaN();
  }
}

Regarding floating-point types with non-binary radix (FLT_RADIX != 2)

With FLT_RADIX = 4, 8, 16 ..., the conversion would be exact too. With FLT_RADIX == 10, code is at least exact up to a 34-bit int as a double must encode +/-10^10 exactly. So a problem with say a FLT_RADIX == 10, 64-bit int machine - a low risk. Based on memory, the last FLT_RADIX == 10 in production was over a decade ago.

The integer type is always encoded as 2's complement (most common), 1s' complement, or sign magnitude. INT_MAX is always a power-2-minus-1. INT_MIN is always a - power-2 or 1 more. Effectively, always base 2.

Aconcagua · Answer

Any idea?

template <typename I, typename F>
constexpr F maxConvertible()
{
    I i = std::numeric_limits<I>::max();
    F f = F(i);
    while(F(i) == f)
    {
        --i;
    }
    return F(i);
}

Due to rounding, we might have got a too large maximum, now downcounting until we get the next representable double being smaller, which should fit into the integral...

Problem left open: This works fine, if conversion to double involves up-rounding; however, even IEEE 754 allows different rounding modes (if rounding to nearest is applied, which should be the most common rounding mode across current hardware, up-rounding will always occur...).

I have not spotted a solution to safely detect down-rounding yet (might add later; at least detecting "rounding to nearest" has already a solution here), if this occurs, we get some negative falses near the maxima and minima of the integral values, you might consider this "acceptable" for those few exotic architectures actually doing down-rounding.

Independent from up- or down-rounding, there is a special case for signed integrals anyway: Provided the integral number is represented in two's complement and has more bits than the mantissa of the floating point value, then the types minimum value will be representable as floating point value whereas some greater values will not. Catching this case requires special treatment.

Reliable overflow detection of floating-point/integer type conversion

Tags:

plasmacel

2 Answers

chux - Reinstate Monica

Aconcagua

Recent Activity

Donate For Us

Reliable overflow detection of floating-point/integer type conversion

Tags:

plasmacel

2 Answers

chux - Reinstate Monica

Aconcagua

Related questions

Recent Activity

Donate For Us