Is there a safe way to reliably determine if an integral type T
can store a floating-point integer value f
(so f == floor(f)
) without any overflow?
Keep in mind that there is no guarantee that the floating point type F
is IEC 559 (IEEE 754) compatible, and that signed integer overflow is undefined behavior in C++. I'm interested in a solution which is correct according to the current C++ (C++17 at the writing) standard and avoids undefined behavior.
The following naive approach is not reliable, since there is no guarantee that type F
can represent std::numeric_limits<I>::max()
due to floating-point rounding.
#include <cmath>
#include <limits>
#include <type_traits>
template <typename I, typename F>
bool is_safe_conversion(F x)
{
static_assert(std::is_floating_point_v<F>);
static_assert(std::is_integral_v<I>);
// 'fmax' may have a different value than expected
static constexpr F fmax = static_cast<F>(std::numeric_limits<I>::max());
return std::abs(x) <= fmax; // this test may gives incorrect results
}
Any idea?
Is there a safe way to reliably determine if an integral type T can store a floating-point integer value f?
Yes. The key is to test if f
is in the range T::MIN - 0.999...
to T::MAX + 0.999...
using floating point math - with no rounding issues. Bonus: rounding mode does not apply.
There are 3 failure paths: too big, too small, not-a-number.
The below assumes
int/double
. I'll leave the C++ template forming for OP.
Forming exact T::MAX + 1
exactly using floating point math is easy as INT_MAX
is a Mersenne Number. (We are not talking about Mersenne Prime here.)
Code takes advantage of:
A Mersenne Number divided by 2 with integer math is also a Mersenne Number.
The conversion of a integer type power-of-2 constant to a floating point type can be certain to be exact.
#define DBL_INT_MAXP1 (2.0*(INT_MAX/2+1))
// Below needed when -INT_MAX == INT_MIN
#define DBL_INT_MINM1 (2.0*(INT_MIN/2-1))
Forming exact T::MIN - 1
is hard as its absolute value is usually a power-of-2 + 1 and the relative precision of the integer type and the FP type are not certain. Instead code can subtract the exact power of 2 and compare to -1.
int double_to_int(double x) {
if (x < DBL_INT_MAXP1) {
#if -INT_MAX == INT_MIN
// rare non-2's complement machine
if (x > DBL_INT_MINM1) {
return (int) x;
}
#else
if (x - INT_MIN > -1.0) {
return (int) x;
}
#endif
Handle_Underflow();
} else if (x > 0) {
Handle_Overflow();
} else {
Handle_NaN();
}
}
Regarding floating-point types with non-binary radix (FLT_RADIX != 2
)
With FLT_RADIX = 4, 8, 16 ...
, the conversion would be exact too. With FLT_RADIX == 10
, code is at least exact up to a 34-bit int
as a double
must encode +/-10^10 exactly. So a problem with say a FLT_RADIX == 10
, 64-bit int
machine - a low risk. Based on memory, the last FLT_RADIX == 10
in production was over a decade ago.
The integer type is always encoded as 2's complement (most common), 1s' complement, or sign magnitude. INT_MAX
is always a power-2-minus-1. INT_MIN
is always a - power-2 or 1 more. Effectively, always base 2.
Any idea?
template <typename I, typename F>
constexpr F maxConvertible()
{
I i = std::numeric_limits<I>::max();
F f = F(i);
while(F(i) == f)
{
--i;
}
return F(i);
}
Due to rounding, we might have got a too large maximum, now downcounting until we get the next representable double being smaller, which should fit into the integral...
Problem left open: This works fine, if conversion to double involves up-rounding; however, even IEEE 754 allows different rounding modes (if rounding to nearest is applied, which should be the most common rounding mode across current hardware, up-rounding will always occur...).
I have not spotted a solution to safely detect down-rounding yet (might add later; at least detecting "rounding to nearest" has already a solution here), if this occurs, we get some negative falses near the maxima and minima of the integral values, you might consider this "acceptable" for those few exotic architectures actually doing down-rounding.
Independent from up- or down-rounding, there is a special case for signed integrals anyway: Provided the integral number is represented in two's complement and has more bits than the mantissa of the floating point value, then the types minimum value will be representable as floating point value whereas some greater values will not. Catching this case requires special treatment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With