Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What rule governs rounding behavior in applying static_cast<float> to a double?

If we have a double-precision value in C++ and do a static_cast<float> on it, will the returned value always be smaller in absolute value? My intuition behind this says yes for the following reasons.

  • The set of possible single precision exponents is strictly a subset of double precision exponents
  • In converting the double precision mantissa to single precision, bits are probably truncated off then end to fit the double's mantissa into the float's mantissa. However, it's not impossible that rounding up is sometimes done to the next highest floating point value if it's more accurate. Perhaps this is system-dependent, or defined in some standard.

I have experimented some numerically with this in the following program. It appears that sometimes, rounding up happens, and other times, round down.

Where can I find more info about how I can expect this rounding to behave? Does it always round to the nearest float?

#include <cmath>
#include <iostream>

int main() {
  // Start testing double precision values starting at x, going up to max
  double x = 0.98;
  constexpr double max = 1e10;

  // Loop over many possible double-precision values, print out
  // if casting to float ever produced a larger number.
  int output_counter = 0; // output every n steps
  constexpr int output_interval = 100000000;

  std::cout.precision(17);
  while (x < max) {
    // volatile to ensure compiler doesn't optimize this out
    volatile float xprime = static_cast<float>(x);
    double xprimeprime = static_cast<double>(xprime);

    if (xprimeprime > x)
      std::cout << "Found a round up! x=" << x << ", xprime = "<< xprime << std::endl;

    // Go to the next higher double precision value
    x = std::nextafter(x, std::numeric_limits<double>::infinity());

    output_counter++;
    if (output_counter == output_interval) {
      std::cout << x << std::endl;
      output_counter = 0;
    }
  }
}
like image 969
Gavin Ridley Avatar asked Jun 04 '21 21:06

Gavin Ridley


2 Answers

The standard says in [conv.double]:

A prvalue of floating-point type can be converted to a prvalue of another floating-point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation. If the source value is between two adjacent destination values, the result of the conversion is an implementation-defined choice of either of those values. Otherwise, the behavior is undefined.

Note that with the <limits> header you can check the round style by std::numeric_limits<T>::round_style. See [round.style] for the possible values. (At least I assume that floating-point conversion falls under floating-point arithmetic.)

like image 181
BlameTheBits Avatar answered Nov 14 '22 21:11

BlameTheBits


I can't find a definitive answer in the Draft C++17 Standard I normally use for answers to questions such as this1; however, cppreference (which is generally reliable) strongly suggests that the rounding mode for floating-point conversions is implementation defined.

However, it also states that, if IEEE-754 rules are followed, rounding takes places to the nearest representable value2:

Floating-point conversions
A prvalue of a floating-point type can be converted to a prvalue of any other floating-point type. If the conversion is listed under floating-point promotions, it is a promotion and not a conversion.

  • If the source value can be represented exactly in the destination type, it does not change. If the source value is between two representable values of the destination type, the result is one of those two values (it is implementation-defined which one, although if IEEE arithmetic is supported, rounding defaults to nearest).
  • Otherwise, the behavior is undefined.

Further, that IEE-754 default behaviour referred can be changed, using the std::fesetround(int round) function, with one of the following rounding modes, defined in the <cfenv> header:

#define FE_DOWNWARD     /*implementation defined*/ // (since C++11)
#define FE_TONEAREST    /*implementation defined*/ // (since C++11)
#define FE_TOWARDZERO   /*implementation defined*/ // (since C++11)
#define FE_UPWARD       /*implementation defined*/ // (since C++11)

1BlameTheBits found the relevant section in the Standard. In the C++17 Draft I referred to, this is actually §7.9.1 but otherwise similar.

2 IEEE-754 actually defines 5 different rules for floating point rounding.

like image 34
Adrian Mole Avatar answered Nov 14 '22 22:11

Adrian Mole