How is floating point conversion actually done in C++?(double to float or float to double)

Tags:

So I've searched about this topic and found nothing really relevant about it.

I've tried to look at the assembly behind this simple code :

int main(int argc, char *argv[])
{
    double d = 1.0;
    float f = static_cast<float>(d);

    system("PAUSE");
    return 0;
}

which is (with Visual Studio 2012) :

    15:     double d = 1.0;
000000013FD7C16D  movsd       xmm0,mmword ptr [__real@3ff0000000000000 (013FD91AB0h)]  
000000013FD7C175  movsd       mmword ptr [d],xmm0  
    16:     float f = static_cast<float>(d);
000000013FD7C17B  cvtsd2ss    xmm0,mmword ptr [d]  
000000013FD7C181  movss       dword ptr [f],xmm0

I'm not that comfortable with assembly but tried to analyze that anyway. So the first two lines seems to move the double-precision value 3ff0000000000000 into a register, and then move the content of the register to the memory adress of d.

Then, I just don't know exactly what does the next lines. The cvtsd2ss operation is apparently an instruction that convert double precision floating point value to single precision floating point value but I couldn't find what this instruction actually does. (Then the converted value is moved to the memory space of f).

So my question is, how is this conversion actually done by this instruction ? I know that the C++ cast will yield the closest value in the other type but apart from that, I have no idea about the actual operations performed...

947

asked May 24 '13 14:05

JBL

1 Answers

The cvtsd2ss instruction uses the FPU's rounding mode to do the conversion. The default rounding mode is round-to-nearest-even.

In order to follow the algorithm, it helps to keep in mind the information at the IEEE 754-1985 Wikipedia page, especially the diagrams representing the layout.

First, the exponent of the target float is computed: the double type has a wider range than float, so the result may be 0.0f (or a denormal) for a very small double, or an infinite value for a very large double.

For the usual case of a normal double being converted to a normal float (roughly, when the unbiased exponent of the double can be represented in the 8 bits of a single-precision representation), the first 23 bits of the destination significand start out the same as the most significant of the original number's 52-bit significand.

Then there is the problem of rounding:

if the left-over bits are below 10..0, then the target significand is left as-is.
If the left-over bits are above 10..0, then the target significand is incremented. If incrementing it makes it overflow (because it is already 1..1), then the carry is propagated into the exponent bits. This produces the correct result because of the careful way the IEEE 754 layout has been designed.
If the bits left over are exactly 10..0, then the double is exactly midway between two floats. Of these two choices, the one with the last bit 0 (“even”) is chosen.

After this step, the target significand corresponds to the float nearest to the original double.

The directed rounding modes are only simpler. The case where the target float is a denormal is slightly more complicated (one must be careful to avoid “double-rounding”).

129

answered Sep 29 '22 12:09

Pascal Cuoq

Related questions
                            
                                C++ linking and template specializations
                            
                                Is this a valid function?
                            
                                Avoid calling constructor of member variable
                            
                                Calculating moving average in C++
                            
                                ConcurrentHashMap for c++
                            
                                How do I invoke the move constructor?
                            
                                C++ IDE that can build over SSH
                            
                                How do I get a list of webcam devices using opencv?
                            
                                Design Pattern, Qt Model/View and multiple threads
                            
                                Is it possible to call a C++ function from JavaScript in a QWebView?
                            
                                c++11: subtlety of std::forward: Is identity really necessary?
                            
                                boost::bind don't compile with member template function
                            
                                Why is this assembly code faster?
                            
                                Why doesn't the EOF character work if put at the end of a line?
                            
                                Check for multiple values when using comparison operators
                            
                                Digit-increasing number test
                            
                                Using std::mutex, std::condition_variable and std::unique_lock
                            
                                boost shared_from_this and multiple inheritance
                            
                                C++11 functionality with MinGW
                            
                                Templated operator instantiation and type conversion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How is floating point conversion actually done in C++?(double to float or float to double)

Tags:

c++

type-conversion

floating-point

assembly

JBL

People also ask

1 Answers

Pascal Cuoq

Recent Activity

Donate For Us