Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect overflow when converting integral to floating types

The C standard, which C++ relies on for these matters as well, as far as I know, has the following section:

When a value of integer type is converted to a real floating type, if the value being converted can be represented exactly in the new type, it is unchanged. If the value being converted is in the range of values that can be represented but cannot be represented exactly, the result is either the nearest higher or nearest lower representable value, chosen in an implementation-defined manner. If the value being converted is outside the range of values that can be represented, the behavior is undefined.

Is there any way I can check for the last case? It seems to me that this last undefined behaviour is unavoidable. If I have an integral value i and naively check something like

i <= FLT_MAX

I will (apart from other problems related to precision) already trigger it because the comparison first converts i to a float (in this case or to any other floating type in general), so if it is out of range, we get undefined behaviour.

Or is there some guarantee about the relative sizes of integral and floating types that would imply something like "float can always represent all values of int (not necessarily exactly of course)" or at least "long double can always hold everything" so that we could do comparisons in that type? I couldn't find anything like that, though.

This is mainly a theoretical exercise, so I'm not interested in answers along the lines of "on most architectures these conversions always work". Let's try to find a way to detect this kind of overflow without assuming anything beyond the C(++) standard! :)

like image 833
Julian Kniephoff Avatar asked Aug 28 '17 20:08

Julian Kniephoff


1 Answers

Detect overflow when converting integral to floating types

FLT_MAX, DBL_MAX are at least 1E+37 per the C spec, so all integers with |values| of 122 bits or less will convert to a float without overflow on all compliant platforms. Same with double


To solve this in the general case for integers of 128/256/etc. bits, both FLT_MAX and some_big_integer_MAX need to be reduced.

Perhaps by taking the log of both. (bit_count() is a TBD user code)

if(bit_count(unsigned_big_integer_MAX) > logbf(FLT_MAX)) problem();

Or if the integer lacks padding

if(sizeof(unsigned_big_integer_MAX)*CHAR_BIT > logbf(FLT_MAX)) problem();

Note: working with a FP function like logbf() may produce an edge condition with the exact integer math with an incorrect compare.


Macro magic can use obtuse tests like the following that takes advantage the BIGINT_MAX is certainly a power-of-2 minus 1 and FLT_MAX division by a power of 2 is certainly exact (unless FLT_RADIX == 10).

This pre-processor code will complain if conversion from a big integer type to float will be inexact for some big integer.

#define POW2_61 0x2000000000000000u  
#if BIGINT_MAX/POW2_61 > POW2_61
  // BIGINT is at least a 122 bit integer 
  #define BIGINT_MAX_PLUS1_div_POW2_61  ((BIGINT_MAX/2 + 1)/(POW2_61/2))
  #if BIGINT_MAX_PLUS1_div_POW2_61 > POW2_61
    #warning TBD code for an integer wider than 183 bits
  #else
    _Static_assert(BIGINT_MAX_PLUS1_div_POW2_61 <= FLT_MAX/POW2_61, 
        "bigint too big for float");
  #endif
#endif

[Edit 2]

Is there any way I can check for the last case?

This code will complain if conversion from a big integer type to float will be inexact for a select big integer.

Of course the test needs to occur before the conversion is attempted.

Given various rounding modes or a rare FLT_RADIX == 10, the best that can readily be had is a test that aims a bit low. When it is true, the conversion will work. Yet a vary small range of of big integers that report false on the below test do convert OK.

Below is a more refined idea that I need to mull over for a bit, yet I hope it provides some coding idea for the test OP is looking for.

#define POW2_60 0x1000000000000000u
#define POW2_62 0x4000000000000000u
#define MAX_FLT_MIN 1e37
#define MAX_FLT_MIN_LOG2 (122 /* 122.911.. */)

bool intmax_to_float_OK(intmax_t x) {
#if INTMAX_MAX/POW2_60 < POW2_62
  (void) x;
  return true; // All big integer values work
#elif INTMAX_MAX/POW2_60/POW2_60 < POW2_62
  return x/POW2_60 < (FLT_MAX/POW2_60) 
#elif INTMAX_MAX/POW2_60/POW2_60/POW2_60 < POW2_62
  return x/POW2_60/POW2_60 < (FLT_MAX/POW2_60/POW2_60) 
#else
  #error TBD code
#endif
}
like image 191
chux - Reinstate Monica Avatar answered Sep 30 '22 15:09

chux - Reinstate Monica