Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Extended (80-bit) to string

How can i convert an Extended precision floating point value to a string?

Background

The Intel CPU supports three floating point formats:

  • 32-bit Single precision
  • 64-bit Double precision
  • 80-bit Extended precision

Delphi has native support for the Extended precision floating point format.

Extended precision is broken down into:

  • 1 sign bit
  • 15 exponent bits
  • 1 integer portion bit (i.e. number starts with 0. or 1.)
  • 63 mantissa bits

You can compare the mantissa size of Extended to that of the other float types:

| Type     | Sign  | Exponent | Integer | Mantissa | 
|----------|-------|----------|---------|----------|
| Single   | 1 bit |  8 bits  |  n/a    | 23 bits  |
| Double   | 1 bit | 11 bits  |  n/a    | 52 bits  |
| Extended | 1 bit | 15 bits  | 1 bit   | 63 bits  |

Extended is capable of higher precision that single and double.

For example, take the real number .49999999999999999, and it's representation in binary:

Single:   0.1000000000000000000000000
Double:   0.10000000000000000000000000000000000000000000000000000
Extended: 0.01111111111111111111111111111111111111111111111111111111010001111

You see that while Single and Double have been forced to round to 0.1 binary (0.5 decimal) , extended still has some precision.

But how to convert binary fractions to a string?

If i attempt to convert the extended value 0.49999999999999998 to a string:

FloatToStr(v);

the function returns 0.5, when i can see inside the Extended and see that it's not 0.5:

0x3FFDFFFFFFFFFFFFFD1E

The same is true for other Extended values; all the functions in Delphi (that i can find) all return 0.5:

Value                   Hex representation      FloatToSTr
0.499999999999999980    0x3FFDFFFFFFFFFFFFFD1E  '0.5'
0.499999999999999981    0x3FFDFFFFFFFFFFFFFD43  '0.5'
0.499999999999999982    0x3FFDFFFFFFFFFFFFFD68  '0.5'
0.499999999999999983    0x3FFDFFFFFFFFFFFFFD8D  '0.5'
0.499999999999999984    0x3FFDFFFFFFFFFFFFFDB2  '0.5'
0.499999999999999985    0x3FFDFFFFFFFFFFFFFDD7  '0.5'
0.499999999999999986    0x3FFDFFFFFFFFFFFFFDFB  '0.5'
0.499999999999999987    0x3FFDFFFFFFFFFFFFFE20  '0.5'
0.499999999999999988    0x3FFDFFFFFFFFFFFFFE45  '0.5'
0.499999999999999989    0x3FFDFFFFFFFFFFFFFE6A  '0.5'
0.499999999999999990    0x3FFDFFFFFFFFFFFFFE8F  '0.5'
...                     ...
0.49999999999999999995  0x3FFDFFFFFFFFFFFFFFFF  '0.5'

What function?

FloatToStr and FloatToStrF are both wrappers around FloatToText.

FloatToText ultimately uses FloatToDecimal to extract, from an extended, a record that contains the pieces of the float:

TFloatRec = packed record
   Exponent: Smallint;
   Negative: Boolean;
   Digits: array[0..20] of Byte;
end;

In my case:

var
   v: Extended;
   fr: TFloatRec;
begin
   v := 0.499999999999999980;

   FloatToDecimal({var}fr, v, fvExtended, 18, 9999);
end;

the decoded float comes back as:

  • Exponent: 0 (SmallInt)
  • Negative: False (Boolean)
  • Digits: [53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] (array[0..20] of Byte)

The Digits is in array of ascii characters:

  • Exponent: 0
  • Negative: False
  • Digits: '5'

FloatToDecimal is limited to 18 digits

The precision of the 63-bit mantissa of an extended precision float can go down to:

1 / (2^63)  
= 1.08420217248550443400745280086994171142578125 × 10^-19   
= 0.000000000000000000108420217248550443400745280086994171142578125
    \_________________/ 
            |
        19 digits

The issue is that:

  • Extended can give you meaningful values up to the 19th digit
  • FloatToDecimal, while returning up to 20 digits, only accepts and generates a maximum request of 18 digits for extended values (19 digits for currency)

For the documentation:

For values of type Extended, the Precision parameter specifies the requested number of significant digits in the result--the allowed range is 1..18.
The Decimals parameter specifies the requested maximum number of digits to the left of the decimal point in the result.
Precision and Decimals together control how the result is rounded. To produce a result that always has a given number of significant digits regardless of the magnitude of the number, specify 9999 for the Decimals parameter.
The result of the conversion is stored in the specified TFloatRec record as follows:

Digits - Contains up to 18 (for type Extended) or 19 (for type Currency) significant digits followed by a null terminator. The implied decimal point (if any) is not stored in Digits.

So i've hit a fundamental limitation of the built-in float formatting functions

How to format an 80-bit IEEE extended precision float?

If Delphi cannot do it itself, the question becomes: how do i do it?

I know the Extended is 10 bytes (SizeOf(Extended) = 10). The question now delves into the dark art of converting a IEEE float to a string.

Some parts are easy:

function ExtendedToDecimal(v: Extended): TFloatRec;
var
    n: UInt64;
const
    BIAS = 16383;
begin
    Result := Default(TFloatRec);

    Result.Negative := v.Sign;
    Result.Exponent := v.Exponent;
    n := v.Mantissa;
//  Result.Digits :=
end;

But the hard part is left as an exercise for the answer.

Bonus Screenshot

enter image description here

like image 253
Ian Boyd Avatar asked Jun 21 '18 18:06

Ian Boyd


1 Answers

How can i convert an Extended precision floating point value to a string?

Since the Delphi RTL does not have any implementations of a correct and complete FloatToStr() function for Extended (and Double for that matter), one would need to use an external library, found here and originally at EDN, Codecentral.

The library was created by John Herbster, a long time contributor to the Delphi RTL libraries, especially regarding floating point handling. The GitHub source code has been updated to use UniCode string handling and a TFormatSettings structure for formatting. The library contains an ExactFloatToStr() function that handles floats of Extended,Double and Single type.

Program TestExactFloatToStr; 

{$APPTYPE CONSOLE}

Uses
  SysUtils,ExactFloatToStr_JH0;

begin
  WriteLn(ExactFloatToStr(Extended(0.49999999999999999)));
  WriteLn(ExactFloatToStr(Double(0.49999999999999999)));
  WriteLn(ExactFloatToStr(Single(0.49999999999999999)));
  ReadLn;
end.

Outputs:

0.49999999999999998999823495882122159628124791197478771209716796875
0.5
0.5
like image 113
LU RD Avatar answered Sep 28 '22 13:09

LU RD