I've a question to the code snippet below: <pre class="prettyprint"><code>long l=9223372036854775807L; float f=static_cast<float>(l); </code></pre> The long value cannot be represanted exactly according to the IEEE754. My Question is how is the lossy conversion handled: <ol> <li>Is the nearest floating point representation taken?</li> <li>Is the next smaller/bigger representation taken?</li> <li>Or is an other approach is taken?</li> </ol> I'm aware of this question what happens at background when convert int to float but this does not anwser my question.

C++ defines the conversion like this (quoting latest standard draft): <blockquote> [conv.fpint] A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type. — end note ] If the value being converted is outside the range of values that can be represented, the behavior is undefined. If the source type is bool, the value false is converted to zero and the value true is converted to one. </blockquote> The IEEE 754 standard defines conversion like this: <blockquote> 5.4.1 Arithmetic operations It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0. </blockquote> Rounding modes are specified as: <blockquote> 4.3.1 Rounding-direction attributes to nearest <ul> <li> roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered. </li> <li> roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered. </li> </ul> 4.3.2 Directed rounding attributes <ul> <li> roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result </li> <li> roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result </li> <li> roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result. </li> </ul> 4.3.3 Rounding attribute requirements The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats. </blockquote> So by default, your suggestion 1 would apply, but only if another mode hasn't been selected. <hr> The C++ standard library inherits <code><cfenv></code> from the C standard. This header offers macros, functions and types for interacting with the floating point environment, including the rounding modes.

How is the precision loss from integer to float defined in C++?

Tags:

c++

floating-point

rounding

static-cast

I've a question to the code snippet below:

long l=9223372036854775807L;
float f=static_cast<float>(l);

The long value cannot be represanted exactly according to the IEEE754.

My Question is how is the lossy conversion handled:

Is the nearest floating point representation taken?
Is the next smaller/bigger representation taken?
Or is an other approach is taken?

I'm aware of this question what happens at background when convert int to float but this does not anwser my question.

741

asked Sep 10 '19 12:09

user1235183

2 Answers

C++ defines the conversion like this (quoting latest standard draft):

[conv.fpint]

A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value. [ Note: Loss of precision occurs if the integral value cannot be represented exactly as a value of the floating-point type. — end note ] If the value being converted is outside the range of values that can be represented, the behavior is undefined. If the source type is bool, the value false is converted to zero and the value true is converted to one.

The IEEE 754 standard defines conversion like this:

5.4.1 Arithmetic operations

It shall be possible to convert from all supported signed and unsigned integer formats to all supported arithmetic formats. Integral values are converted exactly from integer formats to floating-point formats whenever the value is representable in both formats. If the converted value is not exactly representable in the destination format, the result is determined according to the applicable rounding-direction attribute, and an inexact or floating-point overflow exception arises as specified in Clause 7, just as with arithmetic operations. The signs of integer zeros are preserved. Integer zeros without signs are converted to +0. The preferred exponent is 0.

Rounding modes are specified as:

4.3.1 Rounding-direction attributes to nearest

roundTiesToEven, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with an even least significant digit shall be delivered.

roundTiesToAway, the floating-point number nearest to the infinitely precise result shall be delivered; if the two nearest floating-point numbers bracketing an unrepresentable infinitely precise result are equally near, the one with larger magnitude shall be delivered.

4.3.2 Directed rounding attributes

roundTowardPositive, the result shall be the format’s floating-point number (possibly +∞) closest to and no less than the infinitely precise result

roundTowardNegative, the result shall be the format’s floating-point number (possibly −∞) closest to and no greater than the infinitely precise result

roundTowardZero, the result shall be the format’s floating-point number closest to and no greater in magnitude than the infinitely precise result.

4.3.3 Rounding attribute requirements

The roundTiesToEven rounding-direction attribute shall be the default rounding-direction attribute for results in binary formats.

So by default, your suggestion 1 would apply, but only if another mode hasn't been selected.

The C++ standard library inherits <cfenv> from the C standard. This header offers macros, functions and types for interacting with the floating point environment, including the rounding modes.

186

answered Sep 20 '22 00:09

eerorika

See here:

A prvalue of integer or unscoped enumeration type can be converted to a prvalue of any floating-point type. If the value cannot be represented correctly, it is implementation defined whether the closest higher or the closest lower representable value will be selected, although if IEEE arithmetic is supported, rounding defaults to nearest. If the value cannot fit into the destination type, the behavior is undefined. If the source type is bool, the value false is converted to zero, and the value true is converted to one.

As for the rounding rules of IEEE 754, there seem to be five of them. I couldn't find any information on which ones are used in which situation, though. It looks like it's up to the implementation however, you can set the rounding mode in a C++ program as described here.

answered Sep 19 '22 00:09

Blaze

Related questions
                            
                                C++ or macro magic to generate method and forward arguments
                            
                                Why Valgrind segfaults when I launch a new thread
                            
                                Set stack size programmatically on Windows
                            
                                Alternative for std::bind in modern C++
                            
                                Partial template specialization with mismatching `int` and `size_t` not compiling
                            
                                Why does gcc warn about calling a non-trivial move assignment operator with std::tuple and virtual inheritance?
                            
                                C++ parameter pack with single type enforced in arguments
                            
                                Can LTO for gcc or clang optimize across C and C++ methods
                            
                                Has CRTP no compile time check?
                            
                                std::call_once safe for non atomic variables?
                            
                                Why can a templatized derived class access its base private members on gcc?
                            
                                compiler cannot deduce overload of std::max
                            
                                Run cl.exe from cmd
                            
                                Initialization with empty curly braces
                            
                                `friend` member functions and attributes - gcc vs clang
                            
                                Unnecessary emptying of moved-from std::string
                            
                                C++ return a std::string &
                            
                                C++ random yields different numbers for same Mersenne Twister seed when using float precision
                            
                                What happened to std::atomic<X>::value_type?
                            
                                Check if class is a template specialization

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With