I just read this from the C++14 standard (my emphasis): <blockquote> 4.9 Floating-integral conversions [conv.fpint] 1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [...] </blockquote> Which got me thinking <ol> <li>Which, if any, <code>float</code> values could not be represented as <code>int</code> after truncation? (Does that depend on the implementation?)</li> <li>If there are any, does this mean that <code>auto x = static_cast<int>(float)</code> is unsafe?</li> <li>what is the proper/safe way of converting <code>float</code> to <code>int</code> then (assuming you want truncation)?</li> </ol>

It shouldn't be surprising at all that <code>float</code> has values outside of <code>int</code> range. Floating-point values were invented to represent very large (and also very small) values adequately. <ol> <li> <code>INT_MAX + 1</code> (usually equal to <code>2147483648</code>) cannot be represented by <code>int</code>, but can be represented by <code>float</code>.</li> <li>Yes, <code>static_cast<int>(float)</code> is as unsafe as undefined behavior can be. However, something as simple as <code>x + y</code> for sufficiently large integers <code>x</code> and <code>y</code> is also UB, so no big surprise here either.</li> <li> The proper way to do stuff depends on the application, as always in C++. Boost has <code>numeric_cast</code> that throws an exception on overflow; this might be good for you. To do saturation (convert too big values to <code>INT_MIN</code> and <code>INT_MAX</code>), write some code like this <pre class="prettyprint"><code>float f; int i; ... if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX)) i = static_cast<int>(f); else if (f < 0) i = INT_MIN; else i = INT_MAX; </code></pre> However, this is not ideal. Does your system have <code>double</code> type that can represent the maximal value of <code>int</code>? If yes, it will work. Also, how exactly do you want to round values that are close to minimum or maximum of <code>int</code>? If you don't want to consider such questions, use <code>boost::numeric_cast</code>, as described here. </li> </ol>

What float values could not be converted to int without undefined behavior [c++]?

Tags:

c++

type-conversion

implicit-conversion

c++14

I just read this from the C++14 standard (my emphasis):

4.9 Floating-integral conversions [conv.fpint]

1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [...]

Which got me thinking

Which, if any, float values could not be represented as int after truncation? (Does that depend on the implementation?)
If there are any, does this mean that auto x = static_cast<int>(float) is unsafe?
what is the proper/safe way of converting float to int then (assuming you want truncation)?

254

asked Jan 31 '18 17:01

ricab

Video Answer

2 Answers

We hit this a while back and I manually made some tables that have the exact bit patterns of floats at the edges of various conversions to various sizes of integers. Note this assumes iee754 4 byte floats and 8 bytes doubles and 2's complement signed integers (int32_t of 4 bytes and int64_t of 8 bytes).

If you need to convert the bit patterns to floats or doubles you'll need to either type pun them (technically UB) or memcpy them.

And to answer your question anything which is too big to fit in the target integer is UB on conversion, and the only time when the truncating to zero matters is double -> int32_t. So using the following values you can compare the float against the relevant min/max and only cast if they're in range.

Note that using INT_MIN/INT_MAX (or their modern limit counterparts) to cross convert and then compare doesn't always work as the accuracy of floats for those sized values are very low.

Inf/NaN are also UB on conversion.

// float->int64 edgecases
static const uint32_t FloatbitsMaxFitInt64 = 0x5effffff; // [9223371487098961920] Largest float which still fits int an signed int64
static const uint32_t FloatbitsMinNofitInt64 = 0x5f000000; // [9223372036854775808] the bit pattern of the smallest float which is too big for a signed int64
static const uint32_t FloatbitsMinFitInt64 = 0xdf000000; // [-9223372036854775808] Smallest float which still fits int an signed int64
static const uint32_t FloatbitsMaxNotfitInt64 = 0xdf000001; // [-9223373136366403584] Largest float which to small for a signed int64

// float->int32 edgecases
static const uint32_t FloatbitsMaxFitInt32 = 0x4effffff; // [2147483520] the bit pattern of the largest float which still fits int an signed int32
static const uint32_t FloatbitsMinNofitInt32 = 0x4f000000; // [2147483648] the bit pattern of the smallest float which is too big for a signed int32
static const uint32_t FloatbitsMinFitInt32 = 0xcf000000; // [-2147483648] the bit pattern of the smallest float which still fits int an signed int32
static const uint32_t FloatbitsMaxNotfitInt32 = 0xcf000001; // [-2147483904] the bit pattern of the largest float which to small for a signed int32

// double->int64 edgecases
static const uint64_t DoubleBitsMaxFitInt64 = 0x43dfffffffffffff; // [9223372036854774784] Largest double which fits into an int64
static const uint64_t DoubleBitsMinNofitInt64 = 0x43e0000000000000; // [9223372036854775808] Smallest double which is too big for an int64
static const uint64_t DoubleBitsMinFitInt64 = 0xc3e0000000000000; // [-9223372036854775808] Smallest double which fits into an int64
static const uint64_t DoubleBitsMaxNotfitInt64 = 0xc3e0000000000001; // [-9223372036854777856] largest double which is too small to fit into an int64

// double->int32 edgecases[when truncating(round towards zero)]
static const uint64_t DoubleBitsMaxTruncFitInt32 = 0x41dfffffffffffff; // [~2147483647.9999998] Largest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMinTruncNofitInt32 = 0x41e0000000000000; // [2147483648.0000000] Smallest double that when truncated wont fit into an int32
static const uint64_t DoubleBitsMinTruncFitInt32 = 0xc1e00000001fffff; // [~2147483648.9999995] Smallest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMaxTruncNofitInt32 = 0xc1e0000000200000; // [2147483649.0000000] Largest double that when truncated wont fit into an int32

// double->int32 edgecases [when rounding via bankers method(round to nearest, round to even on half)]
static const uint64_t DoubleBitsMaxRoundFitInt32 = 0x41dfffffffdfffff; // [2147483647.5000000] Largest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMinRoundNofitInt32 = 0x41dfffffffe00000; // [~2147483647.5000002] Smallest double that when rounded wont fit into an int32
static const uint64_t DoubleBitsMinRoundFitInt32 = 0xc1e0000000100000; // [-2147483648.5000000] Smallest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMaxRoundNofitInt32 = 0xc1e0000000100001; // [~2147483648.5000005] Largest double that when rounded wont fit into an int32

So for your example you want:

if( f >= B2F(FloatbitsMinFitInt32) && f <= B2F(FloatbitsMaxFitInt32))
    // cast is valid.

Where B2F is something like:

float B2F(uint32_t bits)
{
    static_assert(sizeof(float) == sizeof(uint32_t), "Weird arch");
    float f;
    memcpy(&f, &bits, sizeof(float));
    return f;
}

Note that this conversion picks up nans/inf correctly (as comparisons with them are false) unless you're using a non-iee754 mode of your compiler (e.g. ffast-math on gcc or /fp:fast on msvc)

answered Sep 19 '22 03:09

Mike Vine

It shouldn't be surprising at all that float has values outside of int range. Floating-point values were invented to represent very large (and also very small) values adequately.

INT_MAX + 1 (usually equal to 2147483648) cannot be represented by int, but can be represented by float.
Yes, static_cast<int>(float) is as unsafe as undefined behavior can be. However, something as simple as x + y for sufficiently large integers x and y is also UB, so no big surprise here either.
The proper way to do stuff depends on the application, as always in C++. Boost has numeric_cast that throws an exception on overflow; this might be good for you. To do saturation (convert too big values to INT_MIN and INT_MAX), write some code like this
```
float f;
int i;
...
if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX))
 i = static_cast<int>(f);
else if (f < 0)
 i = INT_MIN;
else
 i = INT_MAX;
```
However, this is not ideal. Does your system have double type that can represent the maximal value of int? If yes, it will work. Also, how exactly do you want to round values that are close to minimum or maximum of int? If you don't want to consider such questions, use boost::numeric_cast, as described here.

answered Sep 19 '22 03:09

anatolyg

Related questions
                            
                                How to do a conversion from enum to type (and use as it in a template) in C++?
                            
                                Destruction Order of Meyers Singletons
                            
                                OpenGL compute shader - strange results
                            
                                How does overload resolution work when an argument is an overloaded function?
                            
                                Practical C++ Metaprogramming
                            
                                C++ expand parameter pack to tuple of arrays
                            
                                Why does std::queue use std::dequeue as underlying default container?
                            
                                Conditionally constexpr member function
                            
                                why SFINAE (enable_if) works from inside class definition but not from outside
                            
                                `std::any_cast` returns a copy
                            
                                Does this constexpr virtual function technique violate any C++11/C++14 rule?
                            
                                How to print hex from uint32_t?
                            
                                How to dump STL container data in gdb?
                            
                                Why doesn't this short comparison optimize the way I expected?
                            
                                How to display the values from structures in C# from C++
                            
                                OpenGL get cursor coordinate on mouse click in C++
                            
                                Why ref_count in shared_ptr implementation is int*
                            
                                C++ why the type of parameter of atomic_load is pointer instead of reference?
                            
                                Why is std::swap not using swap idiom?
                            
                                Multithreading behaviour with ROS AsyncSpinner

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With