Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C - floating point rounding

I'm trying to understand how floating point numbers work.

I think I'd like to test out what I know / need to learn by evaluating the following: I would like to find the smallest x such that x + 1 = x, where x is a floating point number.

As I understand it, this would happen in the case where x is large enough so that x + 1 is closer to x than the next number higher than x representable by floating point. So intuitively it seems it would be the case where I don't have enough digits in the significand. Would this number x then be the number where the significand is all 1's. But then I can't seem to figure out what the exponent would have to be. Obviously it would have to be big (relative to 10^0, anyway).

like image 913
Tony Stark Avatar asked May 03 '10 08:05

Tony Stark


People also ask

Does float round up in C?

In the C Programming Language, the ceil function returns the smallest integer that is greater than or equal to x (ie: rounds up the nearest integer).

Does C round down or up?

Integer division truncates in C, yes. (i.e. it's round towards zero, not round down.) round toward 0 meaning . 5 or greater => round up 0 to .

How do you round a floating-point?

In floating point arithmetic, two extra bits are used to the far right of the significand, called the guard and round bits. At the end of the arithmetic calculation, these bits are rounded off. We always round towards the closer digit (i.e. 0.00-‐0.49 → 0 and 0.51-‐0.99 → 1).

Is there a rounding function in C?

The round( ) function in the C programming language provides the integer value that is nearest to the float, the double or long double type argument passed to it. If the decimal number is between “1 and. 5′′, it gives an integer number less than the argument.


1 Answers

You just need an expression for the value of the LS bit in the mantissa in terms of the exponent. When this is > 1 then you have met your condition. For a single precision float the LS bit has a value of 2^-24*2^exp, so the condition would me met when exp is > 24, i.e. 25. The smallest (normalized) number where this condition would be satisfied would therefore be 1.0 * 2^25 = 33554432.0f.

I haven't checked this, so my maths may be off somewhere (e.g. by a factor of 2) and it's also possible that the FP unit does rounding beyond the 24th bit, so there may be a further factor of 2 needed to account for this, but you get the general idea...

like image 122
Paul R Avatar answered Sep 30 '22 00:09

Paul R