Whats the largest denormalized and normalized number?(64bit, IEE 754-1985)

1 Answers

As you know, the double-precision format looks like this:

enter image description here

The key to understanding denormalized numbers is that they are not actually floating-point numbers but instead use a fixed-point micro-format using the representations that are not used in the 'normal' format.

Normal floating-point numbers are of the form: m*2^e where e is found by subtracting the bias from the exponent field above, and m is a number between 1 and 2, where the bits after the 'binary' point are given by the fraction above. The 1 in front of the binary point is not stored, because it is known to be always 1. The exponent field has a value from 1 to 2046. The values 0 (all zeroes) and 2047 (all ones) are reserved for special uses.

All ones in the exponent field means we have either an infinity or a NaN (Not-a-Number).

All zeroes means we're dealing with denormal floating-point numbers. These are still of the same form, m*2^e, but the values of m and e are derived differently. m is now a number between 0 and 1, so there is a 0 in front of the binary point instead of a 1 for normal numbers. e always has the same value: -1022. So the exponent is a constant, which is why I called it a fixed-point format earlier.

So, the largest possible values for each are:

Normal: (1+1/2 + 1/2^2 + ... + 1/2^52)*2^1023 = (2-2^-52)*2^1023 = 1.797...e+308
Denormal: (0+1/2 + 1/2^2 + ... + 1/2^52)*2^-1022 = (1-2^-52)*2^-1022 = 2.225...e-308

169

answered Oct 13 '22 15:10

Jeffrey Sax

Related questions
                            
                                NumPy linspace rounding error
                            
                                Range of integers that can be expressed precisely as floats / doubles [duplicate]
                            
                                What's the most efficient way to run cross-platform, deterministic simulations in Haskell?
                            
                                Is it a defect to center a simulation in [0.5, 0.5, 0.5] with a box size of 1?
                            
                                Compile c code with float instead of double
                            
                                Is it safe to cascade hypot()?
                            
                                Why C# arithmetic on double appears to be faster than arithmetic on long?
                            
                                Float to Fraction conversion in Python
                            
                                How to check if a float is infinity/zero/denormal?
                            
                                The precision of a large floating point sum
                            
                                What's a method that works exactly like Math.floorMod() but with floats instead of ints?
                            
                                Displaying floats using F-string
                            
                                Can I deterministically sum a vector of arbitrarily arranged floating-point numbers?
                            
                                on what systems does Python not use IEEE-754 double precision floats
                            
                                Comparing floating-point 0
                            
                                How do I convert a string to double using only math.h
                            
                                assert fails when it should not, in Smalltalk Unit testcase
                            
                                Representable result of floor() and ceil()
                            
                                Convert Java floats to C
                            
                                Regarding float type precision

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Whats the largest denormalized and normalized number?(64bit, IEE 754-1985)

Tags:

floating-point

floating-accuracy

binary

denormalization

ieee-754

Carol.Kar

People also ask

1 Answers

Jeffrey Sax

Recent Activity

Donate For Us