Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rounding to the nearest integer in floating point

How can I round a floating point number to the nearest integer? I am looking for the algorithm in terms of binary since I have to implement the code in assembly.

like image 943
Veridian Avatar asked Nov 05 '22 05:11

Veridian


1 Answers

UPDATED with method for proper rounding to even.

Basic Algorithm:

Store the 23-exponent+1'th bit (after the decimal point). Next, zero out the (23-exponent) least significant bits. Then use the stored bit and the new LSB to round. If the stored bit bit is 1, add one to the LSB of the non-truncated part and normalize if necessary. If the stored bit is 0, do nothing.

**

For results matching IEEE-754 standard:

** Before Zeroing out the (23-exponent) least significant bits, OR together the (22-exponent) least significant bits. Call the result of that OR the rounding bit. The stored (23-exponent + 1) bit (after the decimal point) will be called the guard bit. Then zero out the (23-exponent) least significant bits).

If the guard bit is zero, do nothing.

If the guard bit is 1, and the sticky bit is 0, add one to the LSB if the LSB is 1.

If the guard bit is 1 and the sticky bit is 1, add one to the LSB.


Here are some examples using the basic algorithm:

x = 62.3

    sign exponent             mantissa
x =  0      5       (1).11110010011001100110011

Step 1: Store the exponent+1'th bit (after the decimal point)

exponent+1 = 6th bit

savedbit = 0

Step 2: Zero out 23-exponent least significant bits 23-exponent = 18, so we zero out the 18 LSBs

    sign exponent             mantissa
x =  0      5       (1).11110000000000000000000

Step 3: Use the next bit to round Since the stored bit is 0, we do nothing, and the floating point number has been rounded to 62.


Another example:

x = 21.9

    sign exponent             mantissa
x =  0      4       (1).01011110011001100110011

Step 1: Store the exponent+1'th bit (after the decimal point)

exponent+1 = 5th bit

savedbit = 1

Step 2: Zero out 23-exponent least significant bits 23-exponent = 19, so we zero out the 19 LSBs

    sign exponent             mantissa
x =  0      4       (1).01010000000000000000000

Step 3: Use the next bit to round Since the stored bit is 1, we add one to the LSB of the truncated part and get 22, which is the correct number:

We start with:

    sign exponent             mantissa
x =  0      4       (1).01010000000000000000000

Add one at this location:

+                          1

And we get 22:

    sign exponent             mantissa
x =  0      4       (1).01100000000000000000000
like image 81
Veridian Avatar answered Nov 09 '22 14:11

Veridian