What is overflow and underflow in floating point

Tags:

I feel I don't really understand the concept of overflow and underflow. I'm asking this question to clarify this. I need to understand it at its most basic level with bits. Let's work with the simplified floating point representation of 1 byte - 1 bit sign, 3 bits exponent and 4 bits mantissa:

0 000 0000

The max exponent we can store is 111_2=7 minus the bias K=2^2-1=3 which gives 4, and it's reserved for Infinity and NaN. The exponent for max number is 3, which is 110 under offset binary.

So the bit pattern for max number is:

0 110 1111 // positive
1 110 1111 // negative

When the exponent is zero, the number is subnormal and has implicit 0 instead of 1. So the bit pattern for min number is:

0 000 0001 // positive
1 000 0001 // negative

I've found these descriptions for single-precision floating point:

Negative numbers less than −(2−2−23) × 2127 (negative overflow)
Negative numbers greater than −2−149 (negative underflow)
Positive numbers less than 2−149 (positive underflow)
Positive numbers greater than (2−2−23) × 2127 (positive overflow)

Out of them I understand only positive overflow which results in +Infinity, and the example would be like this:

0 110 1111 + 0 110 1111 = 0 111 0000

Can anyone please demonstrate the three other cases for overflow and underflow using the bit patterns I outlined above?

870

asked Oct 17 '16 09:10

Max Koretskyi

1 Answers

Of course the following is implementation dependent, but if the numbers behave anything like what IEEE-754 specifies, Floating point numbers do not overflow and underflow to a wildly incorrect answer like integers do, e.g. you really should not end up with two positive numbers being multiplied resulting in a negative number.

Instead, overflow would mean that the result is 'too large to represent'. Depending on the rounding mode, this either usually gets represented by max float(RTZ) or Inf (RNE):

0 110 1111 * 0 110 1111 = 0 111 0000

(Note that the overflowing of integers as you know it could have been avoided in hardware by applying a similar clamping operation, it's just not the convention to do that.)

When dealing with floating point numbers the term underflow means that the number is 'too small to represent', which usually just results in 0.0:

0 000 0001 * 0 000 0001 = 0 000 0000

Note that I have also heard the term underflow being used for overflow to a very large negative number, but this is not the best term for it. This is an example of when the result is negative and too large to represent, i.e. 'negative overflow':

0 110 1111 * 1 110 1111 = 1 111 0000

answered Oct 13 '22 17:10

Casperrw

Related questions
                            
                                Grunt code coverage doesn't work
                            
                                Explanation of a line in MDN bind polyfill
                            
                                Accessing the computed properties of components in Vue from the parent
                            
                                What does compilerOptions.target specify in tsconfig.json?
                            
                                How to control the z-order of glyphs in Bokeh?
                            
                                Does V8 crash if it cannot allocate memory? Does this crash the entire process?
                            
                                Limit concurrency of promise being run
                            
                                check if items inside ng-repeat already contains value
                            
                                IE - IFRAMES / Data Uri
                            
                                File dialog opens up multiple times when clicked more than once AngularJS [closed]
                            
                                how to use path-to-regexp to match all paths that's not starting with /api/?
                            
                                Javascript Recursive Promise
                            
                                ReactJS + Redux: Why isn't MongoDB saving data to the database even with correct API requests?
                            
                                Dynamic delegation inheritance
                            
                                Flexbox Responsive Mega Menu with Dynamic Content
                            
                                TypeError: cognitiveServices.face is not a constructor
                            
                                How to import .js file inside my .tsx file
                            
                                TypeScript + React: defining defaultProps correctly
                            
                                Date picker(.js) not working in HTML editor but working in fiddle
                            
                                JavaScript scroll based animation is choppy on mobile

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is overflow and underflow in floating point

Tags:

javascript

floating-point

ieee-754

Max Koretskyi

People also ask

1 Answers

Casperrw

Recent Activity

Donate For Us