I would like to have a broad view about "denormal data" and what it's about because the only thing that I think I got right is the fact that is something especially related to floating point values from a programmer viewpoint and it's related to a general-computing approach from the CPU standpoint .
Someone can decrypt this 2 words for me ?
EDIT
please remember that I'm oriented to C++ applications and only the C++ language.
Conversely, a denormalized floating point value has a significand with a leading digit of zero. Of these, the subnormal numbers represent values which if normalized would have exponents below the smallest representable exponent (the exponent having a limited range).
The smallest denormalized positive number occurs with f has 51 0's followed by a single 1. This corresponds to 2-52*2-1022 = 2-1074 ≈ 4.9 × 10-324. Attempts to represent any smaller number must underflow to zero.
You ask about C++, but the specifics of floating-point values and encodings are determined by a floating-point specification, notably IEEE 754, and not by C++. IEEE 754 is by far the most widely used floating-point specification, and I will answer using it.
In IEEE 754, binary floating-point values are encoded with three parts: A sign bit s (0 for positive, 1 for negative), a biased exponent e (the represented exponent plus a fixed offset), and a significand field f (the fraction portion). For normal numbers, these represent exactly the number (−1)s • 2e−bias • 1.f, where 1.f is the binary numeral formed by writing the significand bits after “1.”. (For example, if the significand field has the ten bits 0010111011, it represents the significand 1.00101110112, which is 1.182617175 or 1211/1024.)
The bias depends on the floating-point format. For 64-bit IEEE 754 binary, the exponent field has 11 bits, and the bias is 1023. When the actual exponent is 0, the encoded exponent field is 1023. Actual exponents of −2, −1, 0, 1, and 2 have encoded exponents of 1021, 1022, 1023, 1024, and 1025. When somebody speaks of the exponent of a subnormal number being zero they mean the encoded exponent is zero. The actual exponent would be less than −1022. For 64-bit, the normal exponent interval is −1022 to 1023 (encoded values 1 to 2046). When the exponent moves outside this interval, special things happen.
Above this interval, floating-point stops representing finite numbers. An encoded exponent of 2047 (all 1 bits) represents infinity (with the significand field set to zero). Below this range, floating-point changes to subnormal numbers. When the encoded exponent is zero, the significand field represents 0.f instead of 1.f.
There is an important reason for this. If the lowest exponent value were just another normal encoding, then the lower bits of its significand would be too small to represent as a floating-point values by themselves. Without that leading “1.”, there would be no way to say where the first 1 bit was. For example, suppose you had two numbers, both with the lowest exponent, and with significands 1.00101110112 and 1.00000000002. When you subtract the significands, the result is .00101110112. Unfortunately, there is no way to represent this as a normal number. Because you were already at the lowest exponent, you cannot represent the lower exponent that is needed to say where the first 1 is in this result. Since the mathematical result is too small to be represented, a computer would be forced to return the nearest representable number, which would be zero.
This creates the undesirable property in the floating-point system that you can have a != b
but a-b == 0
. To avoid that, subnormal numbers are used. By using subnormal numbers, we have a special interval where the actual exponent does not decrease, and we can perform arithmetic without creating numbers too small to represent. When the encoded exponent is zero, the actual exponent is the same as when the encoded exponent is one, but the value of the significand changes to 0.f instead of 1.f. When we do this, a != b
guarantees that the computed value of a-b
is not zero.
Here are the combinations of values in the encodings of 64-bit IEEE 754 binary floating-point:
Sign | Exponent (e) | Significand Bits (f) | Meaning |
---|---|---|---|
0 | 0 | 0 | +zero |
0 | 0 | Non-zero | +2−1022•0.f (subnormal) |
0 | 1 to 2046 | Anything | +2e−1023•1.f (normal) |
0 | 2047 | 0 | +infinity |
0 | 2047 | Non-zero but high bit off | +, signaling NaN |
0 | 2047 | High bit on | +, quiet NaN |
1 | 0 | 0 | −zero |
1 | 0 | Non-zero | −2−1022•0.f (subnormal) |
1 | 1 to 2046 | Anything | −2e−1023•1.f (normal) |
1 | 2047 | 0 | −infinity |
1 | 2047 | Non-zero but high bit off | −, signaling NaN |
1 | 2047 | High bit on | −, quiet NaN |
Some notes:
+0 and −0 are mathematically equal, but the sign is preserved. Carefully written applications can make use of it in certain special situations.
NaN means “Not a Number”. Commonly, it means some non-mathematical result or other error has occurred, and a calculation should be discarded or redone another way. Generally, an operation with a NaN produces another NaN, thus preserving the information that something has gone wrong. For example, 3 + NaN
produces a NaN. A signaling NaN is intended to cause an exception, either to indicate that a program has gone wrong or to allow other software (e.g., a debugger) to perform some special action. A quiet NaN is intended to propagate through to further results, allowing the rest of a large computation to be completed, in the cases where a NaN is only a part of a large set of data and will be handled separately later or will be discarded.
The signs, + and −, are retained with NaNs but have no mathematical value.
In normal programming, you should not be concerned about the floating-point encoding, except to the extent it informs you about the limits and behavior of floating-point calculations. You should not need to do anything special regarding subnormal numbers.
Unfortunately, some processors are broken in that they either violate the IEEE 754 standard by changing subnormal numbers to zero or they perform very slowly when subnormal numbers are used. When programming for such processors, you may seek to avoid using subnormal numbers.
To understand de-normal floating point values you first have to understand normal ones. A floating point value has a mantissa and an exponent. In a decimal value, like 1.2345E6, 1.2345 is the mantissa, 6 is the exponent. A nice thing about floating point notation is that you can always write it normalized. Like 0.012345E8 and 0.12345E7 is the same value as 1.2345E6. Or in other words, you can always make the first digit of the mantissa a non-zero number, as long as the value is not zero.
Computers store floating point values in binary, the digits are 0 or 1. So a property of a binary floating point value that is not zero is that it can always be written starting with a 1.
This is a very attractive optimization target. Since the value always starts with 1, there is no point in storing that 1. What is nice about it is that you in effect get an extra bit of precision for free. On a 64-bit double, the mantissa has 52 bits of storage. The actual precision is 53 bits thanks to the implied 1.
We have to talk about the smallest possible floating point value that you can store this way. Doing it in decimal first, if you had a decimal processor with 5 digits of storage in the mantissa and 2 in the exponent then the smallest value it could store that isn't zero is 1.00000E-99. With 1 being the implied digit that isn't stored (doesn't work in decimal but bear with me). So the mantissa stores 00000 and the exponent stores -99. You cannot store a smaller number, the exponent is maxed-out at -99.
Well, you can. You could give up on the normalized representation and forget about the implied digit optimization. You can store it de-normalized. Now you can store 0.1000E-99, or 1.000E-100. All the way down to 0.0001E-99 or 1E-103, the absolute smallest number you can now store.
This is in general desirable, it extends the range of values you can store. Which tends to matter in practical computations, very small numbers are very common in real-world problems like differential analysis.
There's however also a big problem with it, you lose accuracy with de-normalized numbers. The accuracy of floating point calculations is limited by the number of digits you can store. It is intuitive with the fake decimal processor I used as an example, it can only ever compute with 5 significant digits. As long as the value is normalized, you always get 5 significant digits.
But you'll lose digits when you de-normalize. Any value between 0.1000E-99 and 0.9999E-99 has only 4 significant digits. Any value between 0.0100E-99 and 0.0999E-99 has only 3 significant digits. All the way down to 0.0001E-99 and 0.0009E-99, only one significant digit left.
This can greatly reduce the accuracy of the final calculation result. What's worse, it does so in a highly unpredictable manner since these very small de-normalized values tend to show up in a more involved calculation. That's certainly something to worry about, you cannot really trust the end result anymore when it has only 1 significant digit left.
Floating point processors have ways to let you know about this or otherwise sail around the problem. They can for example generate an interrupt or signal when a value becomes de-normalized, letting you interrupt the calculation. And they have a "flush-to-zero" option, a bit in the status word that tells the processor to automatically convert all de-normal values to zero. Which tends to generate infinities, an outcome that tells you that the result is junk and should be discarded.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With