How to convert float to double(both stored in IEEE-754 representation) without losing precision?

Question

I mean, for example, I have the following number encoded in IEEE-754 single precision:

"0100 0001 1011 1110 1100 1100 1100 1100"  (approximately 23.85 in decimal)

The binary number above is stored in literal string.

The question is, how can I convert this string into IEEE-754 double precision representation(somewhat like the following one, but the value is not the same), WITHOUT losing precision?

"0100 0000 0011 0111 1101 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010"

which is ~~the same number~~ encoded in IEEE-754 double precision.

I have tried using the following algorithm to convert the first string back to decimal number first, but it loses precision.

num in decimal = (sign) * (1 + frac * 2^(-23)) * 2^(exp - 127)

I'm using Qt C++ Framework on Windows platform.

EDIT: I must apologize maybe I didn't get the question clearly expressed. What I mean is that I don't know the true value 23.85, I only got the first string and I want to convert it to double precision representation without precision loss.

Kerrek SB · Accepted Answer

Well: keep the sign bit, rewrite the exponent (minus old bias, plus new bias), and pad the mantissa with zeros on the right...

(As @Mark says, you have to treat some special cases separately, namely when the biased exponent is either zero or max.)

Analog File · Answer

IEEE-754 (and floating point in general) cannot represent periodic binary decimals with full precision. Not even when they, in fact, are rational numbers with relatively small integer numerator and denominator. Some languages provide a rational type that may do it (they are the languages that also support unbounded precision integers).

As a consequence those two numbers you posted are NOT the same number.

They in fact are:

10111.11011001100110011000000000000000000000000000000000000000 ... 10111.11011001100110011001100110011001100110011001101000000000 ...

where ... represent an infinite sequence of 0s.

Stephen Canon in a comment above gives you the corresponding decimal values (did not check them, but I have no reason to doubt he got them right).

Therefore the conversion you want to do cannot be done as the single precision number does not have the information you would need (you have NO WAY to know if the number is in fact periodic or simply looks like being because there happens to be a repetition).

Jirka Hanika · Answer

First of all, +1 for identifying the input in binary.

Second, that number does not represent 23.85, but slightly less. If you flip its last binary digit from 0 to 1, the number will still not accurately represent 23.85, but slightly more. Those differences cannot be adequately captured in a float, but they can be approximately captured in a double.

Third, what you think you are losing is called accuracy, not precision. The precision of the number always grows by conversion from single precision to double precision, while the accuracy can never improve by a conversion (your inaccurate number remains inaccurate, but the additional precision makes it more obvious).

I recommend converting to a float or rounding or adding a very small value just before displaying (or logging) the number, because visual appearance is what you really lost by increasing the precision.

Resist the temptation to round right after the cast and to use the rounded value in subsequent computation - this is especially risky in loops. While this might appear to correct the issue in the debugger, the accummulated additional inaccuracies could distort the end result even more.

Dan · Answer

It might be easiest to convert the string into an actual float, convert that to a double, and convert it back to a string.

How to convert float to double(both stored in IEEE-754 representation) without losing precision?

Tags:

c++

floating-point

double

qt

ieee-754

Richard

4 Answers

Kerrek SB

Analog File

Jirka Hanika

Dan

Recent Activity

Donate For Us

How to convert float to double(both stored in IEEE-754 representation) without losing precision?

Tags:

c++

floating-point

double

qt

ieee-754

Richard

4 Answers

Kerrek SB

Analog File

Jirka Hanika

Dan

Related questions

Recent Activity

Donate For Us