Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What float values could not be converted to int without undefined behavior [c++]?

I just read this from the C++14 standard (my emphasis):

4.9 Floating-integral conversions [conv.fpint]

1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [...]

Which got me thinking

  1. Which, if any, float values could not be represented as int after truncation? (Does that depend on the implementation?)
  2. If there are any, does this mean that auto x = static_cast<int>(float) is unsafe?
  3. what is the proper/safe way of converting float to int then (assuming you want truncation)?
like image 254
ricab Avatar asked Jan 31 '18 17:01

ricab


People also ask

Can we change float to int in C?

The way to get the value is either the lib function int floor(float) or (for roundingup) int ceil(float).

Can a float be converted to int?

A float value can be converted to an int value no larger than the input by using the math. floor() function, whereas it can also be converted to an int value which is the smallest integer greater than the input using math. ceil() function.

What will happen if you cast a float to an integer?

Convert a float to an int always results in a data loss. The trunc() function returns the integer part of a number. The floor() function returns the largest integer less than or equal to a number. The ceil() function returns the smallest integer greater than or equal to a number.

Can I store float in int or not with explanation?

So you cannot store a float value in an int object through simple assignment. You can store the bit pattern for a floating-point value in an int object if the int is at least as wide as the float , either by using memcpy or some kind of casting gymnastics (as other answers have shown).


Video Answer


2 Answers

We hit this a while back and I manually made some tables that have the exact bit patterns of floats at the edges of various conversions to various sizes of integers. Note this assumes iee754 4 byte floats and 8 bytes doubles and 2's complement signed integers (int32_t of 4 bytes and int64_t of 8 bytes).

If you need to convert the bit patterns to floats or doubles you'll need to either type pun them (technically UB) or memcpy them.

And to answer your question anything which is too big to fit in the target integer is UB on conversion, and the only time when the truncating to zero matters is double -> int32_t. So using the following values you can compare the float against the relevant min/max and only cast if they're in range.

Note that using INT_MIN/INT_MAX (or their modern limit counterparts) to cross convert and then compare doesn't always work as the accuracy of floats for those sized values are very low.

Inf/NaN are also UB on conversion.

// float->int64 edgecases
static const uint32_t FloatbitsMaxFitInt64 = 0x5effffff; // [9223371487098961920] Largest float which still fits int an signed int64
static const uint32_t FloatbitsMinNofitInt64 = 0x5f000000; // [9223372036854775808] the bit pattern of the smallest float which is too big for a signed int64
static const uint32_t FloatbitsMinFitInt64 = 0xdf000000; // [-9223372036854775808] Smallest float which still fits int an signed int64
static const uint32_t FloatbitsMaxNotfitInt64 = 0xdf000001; // [-9223373136366403584] Largest float which to small for a signed int64

// float->int32 edgecases
static const uint32_t FloatbitsMaxFitInt32 = 0x4effffff; // [2147483520] the bit pattern of the largest float which still fits int an signed int32
static const uint32_t FloatbitsMinNofitInt32 = 0x4f000000; // [2147483648] the bit pattern of the smallest float which is too big for a signed int32
static const uint32_t FloatbitsMinFitInt32 = 0xcf000000; // [-2147483648] the bit pattern of the smallest float which still fits int an signed int32
static const uint32_t FloatbitsMaxNotfitInt32 = 0xcf000001; // [-2147483904] the bit pattern of the largest float which to small for a signed int32

// double->int64 edgecases
static const uint64_t DoubleBitsMaxFitInt64 = 0x43dfffffffffffff; // [9223372036854774784] Largest double which fits into an int64
static const uint64_t DoubleBitsMinNofitInt64 = 0x43e0000000000000; // [9223372036854775808] Smallest double which is too big for an int64
static const uint64_t DoubleBitsMinFitInt64 = 0xc3e0000000000000; // [-9223372036854775808] Smallest double which fits into an int64
static const uint64_t DoubleBitsMaxNotfitInt64 = 0xc3e0000000000001; // [-9223372036854777856] largest double which is too small to fit into an int64

// double->int32 edgecases[when truncating(round towards zero)]
static const uint64_t DoubleBitsMaxTruncFitInt32 = 0x41dfffffffffffff; // [~2147483647.9999998] Largest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMinTruncNofitInt32 = 0x41e0000000000000; // [2147483648.0000000] Smallest double that when truncated wont fit into an int32
static const uint64_t DoubleBitsMinTruncFitInt32 = 0xc1e00000001fffff; // [~2147483648.9999995] Smallest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMaxTruncNofitInt32 = 0xc1e0000000200000; // [2147483649.0000000] Largest double that when truncated wont fit into an int32

// double->int32 edgecases [when rounding via bankers method(round to nearest, round to even on half)]
static const uint64_t DoubleBitsMaxRoundFitInt32 = 0x41dfffffffdfffff; // [2147483647.5000000] Largest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMinRoundNofitInt32 = 0x41dfffffffe00000; // [~2147483647.5000002] Smallest double that when rounded wont fit into an int32
static const uint64_t DoubleBitsMinRoundFitInt32 = 0xc1e0000000100000; // [-2147483648.5000000] Smallest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMaxRoundNofitInt32 = 0xc1e0000000100001; // [~2147483648.5000005] Largest double that when rounded wont fit into an int32

So for your example you want:

if( f >= B2F(FloatbitsMinFitInt32) && f <= B2F(FloatbitsMaxFitInt32))
    // cast is valid.

Where B2F is something like:

float B2F(uint32_t bits)
{
    static_assert(sizeof(float) == sizeof(uint32_t), "Weird arch");
    float f;
    memcpy(&f, &bits, sizeof(float));
    return f;
}

Note that this conversion picks up nans/inf correctly (as comparisons with them are false) unless you're using a non-iee754 mode of your compiler (e.g. ffast-math on gcc or /fp:fast on msvc)

like image 51
Mike Vine Avatar answered Sep 19 '22 03:09

Mike Vine


It shouldn't be surprising at all that float has values outside of int range. Floating-point values were invented to represent very large (and also very small) values adequately.

  1. INT_MAX + 1 (usually equal to 2147483648) cannot be represented by int, but can be represented by float.
  2. Yes, static_cast<int>(float) is as unsafe as undefined behavior can be. However, something as simple as x + y for sufficiently large integers x and y is also UB, so no big surprise here either.
  3. The proper way to do stuff depends on the application, as always in C++. Boost has numeric_cast that throws an exception on overflow; this might be good for you. To do saturation (convert too big values to INT_MIN and INT_MAX), write some code like this

    float f;
    int i;
    ...
    if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX))
        i = static_cast<int>(f);
    else if (f < 0)
        i = INT_MIN;
    else
        i = INT_MAX;
    

    However, this is not ideal. Does your system have double type that can represent the maximal value of int? If yes, it will work. Also, how exactly do you want to round values that are close to minimum or maximum of int? If you don't want to consider such questions, use boost::numeric_cast, as described here.

like image 34
anatolyg Avatar answered Sep 19 '22 03:09

anatolyg