Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform round to even with floating point numbers

In regards to IEEE-754 single precision floating point, how do you perform round to nearest, where ties round to the nearest even digit in the required position (the default and by far the most common mode)?

Basically I have the guard bit, round bit, and sticky bit. So if we form those into a vector and call it GRS, then the following rules apply:

  1. If G = 0, round down (do nothing)
  2. If G = 1, and RS == 10 or RS == 01, round up (add one to mantissa)
  3. if GSR = 111, round to even

So I am not sure how to perform the round to nearest. Any help is greatly appreciated.

like image 298
Veridian Avatar asked Jan 24 '12 04:01

Veridian


People also ask

How do you round a floating point number?

The round() function returns a floating point number that is a rounded version of the specified number, with the specified number of decimals. The default number of decimals is 0, meaning that the function will return the nearest integer.

How do you round to even numbers?

This familiar rule is used by many rounding methods. If the difference between the number and the nearest integer is exactly 0.5, look at the integer part of the number. If the integer part is EVEN, round towards zero. If the integer part of the number is ODD, round away from zero.

How do you round to even numbers in Python?

Round a number Up to the nearest even number in Python #Call the math. ceil() method passing it the number divided by 2 . Multiply the result by 2 . The result of the calculation is the number rounded up to the next even integer.

How does floating rounding work?

Because floating-point numbers have a limited number of digits, they cannot represent all real numbers accurately: when there are more digits than the format allows, the leftover ones are omitted - the number is rounded.


1 Answers

Just to make sure we're on the same page, G is the most significant bit of the three, R comes next and S can be thought of as the least significant bit because its value partially represents the even less significant bits that have been truncated in the calculations. These three bits are only used while doing calculations and aren't stored in the floating-point variable before or after the calculations.

This is what you should do in order to round the result to the nearest even number using G, R and S:

GRS - Action
0xx - round down = do nothing (x means any bit value, 0 or 1)
100 - this is a tie: round up if the mantissa's bit just before G is 1, else round down=do nothing
101 - round up
110 - round up
111 - round up

Rounding up is done by adding 1 to the mantissa in the mantissa's least significant bit position just before G. If the mantissa overflows (its 23 least significant bits that you will store become zeroes), you have to add 1 to the exponent. If the exponent overflows, you set the number to +infinity or -infinity depending on the number's sign.

In the case of a tie, you add 1 to the mantissa if the mantissa is odd and you add nothing if it's even. That's what makes the result rounded to the nearest even value.

like image 163
Alexey Frunze Avatar answered Sep 22 '22 12:09

Alexey Frunze