Why are denormal floating-point values slower to handle?

Video Answer

1 Answers

With IEEE-754 floating-point most operands encountered are normalized floating-point numbers, and internal data paths in processors are built for normalized operands. Additional exponent bits may be used for internal representations to keep floating-point operands normalized inside the data path at all times.

Any subnormal inputs therefore require additional work to first determine the number of leading zeros to then left shift the significand for normalization while adjusting the exponent. A subnormal result requires right shifting the significand by the appropriate amount and rounding may need to be deferred until after that has happened.

If solved purely in hardware, this additional work typically requires additional hardware and additional pipeline stages: One, maybe even two, additional clock cycles each for handling subnormal inputs and subnormal outputs. But the performance of typical CPUs is sensitive to the latency of instructions, and significant effort is expended to keep latencies low. The latency of an FADD, FMUL, or FMA instruction is typically between 3 to 6 cycles depending on implementation and frequency targets.

Adding, say, 50% additional latency for the potential handling of subnormal operands is therefore unattractive, even more so because subnormal operands are rare for most use cases. Using the design philosophy of "make the common case fast, and the uncommon case functional" there is therefore a significant incentive to push the handling of subnormal operands out of the "fast path" (pure hardware) into a "slow path" (combination of existing hardware plus software).

I have participated in the design of floating-point units for x86 processors, and the common approach for handling subnormals is to invoke an internal micro-code level exception when these need to be handled. This subnormal handling may take on the order of 100 clock cycles. The most expensive part of that is typically not the execution of the fix-up code itself, but getting in and out of the microcode exception handler.

I am aware of specific use cases, for example particular filters in digital signal processing, where encountering subnormals is common. To support such applications at speed, many floating-point units support a non-standard flush-to-zero mode in which subnormal encodings are treated as zero.

Note that there are throughput-oriented processor designs with significant latency tolerance, in particular GPUs. I am familiar with NVIDIA GPUs, and best I can tell they handle subnormal operands without additional overhead and have done so for the past dozen years or so. Presumably this comes at the cost of additional pipeline stages, but the vendor does not document many of the microarchitectural details of these processors, so it is hard to know for sure. The following paper may provide some general insights how different hardware designs handle subnormal operands, some with very little overhead:

E.M. Schwarz, M. Schmookler, and S.D. Trong, "FPU implementations with denormalized numbers." IEEE Transactions on Computers, Vol. 54, No. 7, July 2005, pp. 825 - 836

117

answered Nov 30 '22 11:11

njuffa

Related questions
                            
                                How to store a floating point number as text without losing precision?
                            
                                How to ensure same float numbers on different systems?
                            
                                Efficient way to round double precision numbers to a lower precision given in number of bits
                            
                                What precision are floating-point arithmetic operations done in?
                            
                                Django Float Field input
                            
                                Delphi - comparison of two "Real" number variables
                            
                                C++ Float Comparison Around Zero Fails With Gtest
                            
                                bitwise casting uint32_t to float in C/C++
                            
                                How can I increase precision in R when calculating with probabilities close to 0 and 1?
                            
                                Formatting floats: returning to default
                            
                                How to use std::signaling_nan?
                            
                                How do I do floating point rounding with a bias (always round up or down)?
                            
                                Strange compiler behavior with float literals vs float variables
                            
                                Why is there an implicit conversion from Float/Double to BigDecimal, but not from String?
                            
                                In C and Objective-C, should we use 0.5f or 0.5?
                            
                                Generating random latitude and longitude in MySQL
                            
                                Is the behavior of unsigned(-0.0) defined in C++?
                            
                                Using Haskell ranges: Why would mapping a floating point function across a range cause it to return an extra element?
                            
                                Why does Swift use base 2 for the exponent of hexadecimal floating point values?
                            
                                Comparing two pandas series for floating point near-equality?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why are denormal floating-point values slower to handle?

Tags:

floating-point

rwallace

People also ask

Video Answer

1 Answers

njuffa

Recent Activity

Donate For Us