Why denormalized floats are so much slower than other floats, from hardware architecture viewpoint?

Tags:

floating-point

Denormals are known to underperform severely, 100x or so, compared to normals. This frequently causes unexpected software problems.

I'm curious, from CPU Architecture viewpoint, why denormals have to be that much slower? Is the lack of performance is intrinsic to their unfortunate representation? Or maybe CPU architects neglect them to reduce hardware cost under the (mistaken) assumption that denormals don't matter?

In the former case, if denormals are intrinsically hardware-unfriendly, are there known non-IEEE-754 floating point representations that are also gapless near zero, but more convenient for hardware implementation?

872

asked Apr 21 '16 22:04

Michael

1 Answers

On most x86 systems, the cause of slowness is that denormal values trigger an FP_ASSIST which is very costly as it switches to a micro-code flow (very much like a fault).

see for example - https://software.intel.com/en-us/forums/intel-performance-bottleneck-analyzer/topic/487262

The reason why this is the case, is probably that the architects decided to optimize the HW for normal values by speculating that each value is normalized (which would be more common), and did not want to risk the performance of the frequent use case for the sake of rare corner cases. This speculation is usually true, so you only pay the penalty when you're wrong. These trade-offs are very common in CPU design since any investment in one case usually adds an overhead on the entire system.

In this case, if you were to design a system that tries to optimize all type of irregular FP values, you would have to either add HW to detect and record the state of each value after each operation (which would be multiplied by the number of physical FP registers, execution units, RS entries and so on - totaling in a significant number of transistors and wires. Alternatively, you would have to add some mechanism to check the value on read, which would slow you down when reading any FP value (even on the normal ones).

Furthermore, based on the type, you would need to perform some correction or not - on x86 this is the purpose of the assist code, but if you did not make a speculation, you would have to perform this flow conditionally on each value, which would already add a large chunk of that overhead on the common path.

answered Oct 26 '22 04:10

Leeor

Related questions
                            
                                Convert floating point variable to integer?
                            
                                What are the applications/benefits of an 80-bit extended precision data type?
                            
                                How is infinity represented in a C double?
                            
                                Django, division between two annotate result won't calculate correctly
                            
                                python difference between round and int
                            
                                Calculating e (base of the natural log) to high precision in Python?
                            
                                Python round a float to nearest 0.05 or to multiple of another float
                            
                                C# get digits from float variable
                            
                                Converting float decimal to fraction
                            
                                How To Represent 0.1 In Floating Point Arithmetic And Decimal
                            
                                Java: Bytes to floats / ints
                            
                                Concise way to implement round() in C?
                            
                                How to manage division of huge numbers in Python?
                            
                                Forcing 64-bit long doubles?
                            
                                c++ sqrt guaranteed precision, upper/lower bound
                            
                                Evil in the python decimal / float
                            
                                Parsing double values using Double vs BigDecimal in Java
                            
                                When to use float vs decimal
                            
                                Range Reduction Poor Precision For Single Precision Floating Point

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With