How do I convert an arbitrary double to an integer while avoiding undefined behavior?

Q: How to convert double to integer in Java?

Given a Double real number, the task is to convert it into Integer in Java. Examples: Using typecasting: This technique is a very simple and user friendly. Syntax: double data = 3452.345 int value = (int)data; Example: Using Math.round(): This method returns the nearest integer.

Q: How do I round a double value to an int?

Math.round () accepts a double value and converts it into the nearest long value by adding 0.5 to the value and trimming its decimal points. The long value can then be converted to an int using typecasting.

Q: What is the difference between double and INT data type?

The double data type is commonly used for decimal values, just like float. The double data type also should never be used for precise values, such as currency. Its default value is 0.0. Example: double d1 = 10.5 Integer: The Integer or int data type is a 32-bit signed two’s complement integer.

Q: How do you round a double to an integer in Python?

Math.round () accepts a double value and converts it into the nearest long value by adding 0.5 to the value and trimming its decimal points. The long value can then be converted to an int using typecasting. Note – Here you can see that the Math.round () method converts the double to an integer by rounding off the number to the nearest integer.

Tags:

c++

type-conversion

language-lawyer

undefined-behavior

Let's say I've got a function that accepts a 64-bit integer, and I want to call it with a double with arbitrary numeric value (i.e. it may be very large in magnitude, or even infinite):

Click to copy

void DoSomething(int64_t x);

double d = [...];
DoSomething(d);

Paragraph 1 of [conv.fpint] in the C++11 standard says this:

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion trun- cates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

Therefore there are many values of d above that will cause undefined behavior. I would like conversion to saturate, so that values greater than std::numeric_limits<int64_t>::max() (called kint64max below), including infinity, become that value, and similarly with the minimum representable value. This seems the natural approach:

Click to copy

double clamped = std::min(d, static_cast<double>(kint64max));
clamped = std::max(clamped, static_cast<double>(kint64min));
DoSomething(clamped);

But, the next paragraph in the standard says this:

A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.

So clamped may still wind up being kint64max + 1, and behavior may still be undefined.

What is the simplest portable way to do what I'm looking for? Bonus points if it also gracefully handles NaNs.

Update: To be more precise, I would like the following to all be true of an int64_t SafeCast(double) function that solves this problem:

For any double d, calling SafeCast(d) does not perform undefined behavior according to the standard, nor does it throw an exception or otherwise abort.
For any double d in the range [-2^63, 2^63), SafeCast(d) == static_cast<int64_t>(d). That is, SafeCast agrees with C++'s conversion rules wherever the latter is defined.
For any double d >= 2^63, SafeCast(d) == kint64max.
For any double d < -2^63, SafeCast(d) == kint64min.

I suspect the true difficulty here is in figuring out whether d is in the range [-2^63, 2^63). As discussed in the question and in comments to other answers, I think using a cast of kint64max to double to test for the upper bound is a non-starter due to undefined behavior. It may be more promising to use std::pow(2, 63), but I don't know whether this is guaranteed to be exactly 2^63.

814

asked Sep 15 '14 22:09

jacobsa

1 Answers

It turns out this is simpler to do than I thought. Thanks to Michael O'Reilly for the basic idea of this solution.

The heart of the matter is figuring out whether the truncated double will be representable as an int64_t. You can do this easily using std::frexp:

Click to copy

#include <cmath>
#include <limits>

static constexpr int64_t kint64min = std::numeric_limits<int64_t>::min();
static constexpr int64_t kint64max = std::numeric_limits<int64_t>::max();

int64_t SafeCast(double d) {
  // We must special-case NaN, for which the logic below doesn't work.
  if (std::isnan(d)) {
    return 0;
  }

  // Find that exponent exp such that
  //     d == x * 2^exp
  // for some x with abs(x) in [0.5, 1.0). Note that this implies that the
  // magnitude of d is strictly less than 2^exp.
  //
  // If d is infinite, the call to std::frexp is legal but the contents of exp
  // are unspecified.
  int exp;
  std::frexp(d, &exp);

  // If the magnitude of d is strictly less than 2^63, the truncated version
  // of d is guaranteed to be representable. The only representable integer
  // for which this is not the case is kint64min, but it is covered by the
  // logic below.
  if (std::isfinite(d) && exp <= 63) {
    return d;
  }

  // Handle infinities and finite numbers with magnitude >= 2^63.
  return std::signbit(d) ? kint64min : kint64max;
}

161

answered Nov 02 '22 05:11

jacobsa

Related questions
                            
                                Modular arithmetics and NTT (finite field DFT) optimizations
                            
                                Will C++14 support unconstrained generic functions?
                            
                                Does the C++ standard require that dynamic initialization of static variables be performed in the main thread?
                            
                                Is there an up to date errata for C++ Primer 5th edition?
                            
                                Why does random extra code improve performance?
                            
                                c++ sqrt guaranteed precision, upper/lower bound
                            
                                Fast logarithm calculation
                            
                                Starting at what version of Visual Studio is vsnprintf mostly standard-conformant?
                            
                                inlining failed in call to always_inline '__m128i _mm_cvtepu8_epi32(__m128i)': target specific option mismatch _mm_cvtepu8_epi32 (__m128i __X) [duplicate]
                            
                                Is this a bug in gcc optimizer?
                            
                                Accidentially 'instantiate' an abstract class type using brace-initializers? [duplicate]
                            
                                Fatal error in extension: PyThreadState_Get: no current thread
                            
                                Open source examples of well designed applications [closed]
                            
                                abort() is not __declspec(noreturn) in VS2010
                            
                                Topological Sort with Grouping
                            
                                Partial template specialization with multiple template parameter packs
                            
                                Debug Assertion Failed … _BLOCK_TYPE_IS_VALID(pHead->nBlockUse)
                            
                                Macro to obtain current namespace and function name (but not full signature)?
                            
                                C++11 nested macro invocation?
                            
                                Calling Haskell library from C++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I convert an arbitrary double to an integer while avoiding undefined behavior?

Tags:

c++

type-conversion

language-lawyer

undefined-behavior

jacobsa

People also ask

1 Answers

jacobsa

Recent Activity

Donate For Us