In C and Objective-C, what really is the right way to truncate a float or double to an integer?

Tags:

I worked mostly with integers before, and in situations where I need to truncate a float or double to an integer, I would use the following before:

(int) someValue

except until I found out the following:

NSLog(@"%i", (int) ((1.2 - 1) * 10));     // prints 1
NSLog(@"%i", (int) ((1.2f - 1) * 10));    // prints 2

(please see Strange behavior when casting a float to int in C# for the explanation).

The short question is: how should we truncate a float or double to an integer properly? (Truncation is wanted in this case, not "rounding"). Or, we may say that since one number is 1.9999999999999 and the other is 2.00000000000001 (roughly speaking), the truncate is actually done correctly. So the question is, how should we convert a float or double so that the result is a "truncated" number that makes common usage sense?

(the intention is not to use round, because in this case, for 1.8, we do want the result of 1, instead of 2)

Longer question:

I used

int truncateToInteger(double a) {
    return (int) (a + 0.000000000001);
}

-(void) someTest {
    NSLog(@"%i", truncateToInteger((1.2 - 1) * 10));
    NSLog(@"%i", truncateToInteger((1.2f - 1) * 10));
}

and both print out as 2, but it seems too much of a hack, and what small number should we use to "remove the inaccuracy"? Is there a more standard or studied way, instead of such an arbitrary hack?

(Note that we want truncation, not rounding in some usage, for example, say, if the number of seconds is 90 or 118, when we show how many minutes and how many seconds have elapsed, the minute should display as 1, but should not be rounded up to 2)

436

asked Jun 28 '12 12:06

nonopolarity

4 Answers

The truncate has been performed correctly, of course, but on an inaccurate intermediate value.

In general there's no way to know whether your 1.999999 result is a slightly inaccurate 2 (so the exact-maths result after truncation is 2), or a slightly inaccurate 1.999998 (so the exact-maths result after truncation is 1).

For that matter, for some calculations you could get 2.000001 as a slightly inaccurate 1.999998. Pretty much whatever you do, you'll get that one wrong. Truncation is a non-continuous function, so however you do it, it makes your overall computation numerically unstable.

You could add an arbitrary tolerance anyway: (int)(x > 0 ? x + epsilon : x - epsilon). It may or my not help, depending what you're doing, which is why it's a "hack". epsilon could be a constant, or it could scale according to the size of x.

The most common solution to your second question isn't to "remove the inaccuracy", rather to accept the inaccurate result as if it were accurate. So, if your floating point unit says that (1.2-1)*10 is 1.999999, OK, it is 1.999999. If that value represents a number of minutes then it truncates to 1 minute 59 seconds. Your final displayed result will be 1s off the true value. If you need a more accurate final displayed result than that, then you shouldn't have used floating-point arithmetic to compute it, or perhaps you should have rounded to the nearest second before truncating to minutes.

Any attempt to "remove" inaccuracy from a floating-point number is actually just going to move inaccuracy around - some inputs will give more accurate results, others less accurate. If you're lucky enough to be in a case where the the inaccuracy is shifted to inputs you don't care about, or can filter out before doing the computation, then you win. In general though, if you have to accept any input then you're going to lose somewhere. You need to look at how to make your computation more accurate, rather than trying to remove inaccuracy in a truncation step at the end.

There's a simple correction for your example computation -- use fixed-point arithmetic with one base-10 decimal place. We know that format can accurately represent 1.2. So, instead of writing (1.2 - 1) * 10, you should rescale the computation to use tenths (write (12 - 10) * 10) and then divide the final result by 10 to scale it back to units.

answered Sep 29 '22 02:09

Steve Jessop

As you have modified your question, the problem now seems to be this: Given some inputs x, you calculate a value f'(x). f'(x) is the calculated approximation to an exact mathematical function f(x). You want to calculate trunc(f(x)), that is, the integer i that is farthest from zero without being farther from zero than f(x) is. Because f'(x) has some error, trunc(f'(x)) might not equal trunc(f(x)), such as when f(x) is 2 but f'(x) is 0x1.fffffffffffffp0. Given f'(x), how can you calculate trunc(f(x))?

This problem is impossible to solve. There is no solution that will work for all f.

The reason there is no solution is that, due to the error in f', f'(x) might be 0x1.fffffffffffffp0 because f(x) is 0x1.fffffffffffffp0, or f'(x) might be 0x1.fffffffffffffp0 because of calculation errors even though f(x) is 2. Therefore, given a particular value of f'(x), it is impossible to know what trunc(f(x)) is.

A solution is possible only given detailed information about f (and the actual operations used to approximate it with f'). You have not given that information, so your question cannot be answered.

Here is a hypothesis: Suppose the nature of f(x) is such that its results are always a non-negative multiple of q, for some q that divides 1. For example, q might be .01 (hundredths of a coordinate value) or 1/60 (represent units of seconds because f is in units of minutes). And suppose the values and operations used in calculating f' are such that the error in f' is always less than q/2.

In this very limited, and hypothetical, case, then trunc(f(x)) can be calculated by calculating trunc(f'(x)+q/2). Proof: Let i = trunc(f(x)). Suppose i > 0. Then i <= f(x) < i+1, so i <= f(x) <= i+1-q (because f(x) is quantized by q). Then i-q/2 < f'(x) < i+1-q+q/2 (because f'(x) is within q/2 of f(x)). Then i < f'(x)+q/2 < i+1. Then trunc(f'(x)+q/2) = i, so we have the desired result. In the case where i = 0, then -1 < f(x) < 1, so -1+q <= f(x) <= 1-q, so -1+q-q/2 < f'(x) < 1-q+q/2, so -1+q < f'(x)+q/2 < 1, so trunc(f'(x)+q/2) = 0.

(Note: If q/2 is not exactly representable in the floating-point precision used or cannot be easily added to f'(x) without error, then some adjustments have to be made in either the proof, its conditions, or the addition of q/2.)

If that case does not serve your purpose, then you cannot expect an answer expect by providing detailed information about f and the operations and values used to calculate f'.

answered Sep 29 '22 02:09

Eric Postpischil

The 'hack' is the proper way to do it. It's simple how floats work, if you want more sane decimal behavior NSDecimal(Number) might be what you want.

answered Sep 29 '22 01:09

Hampus Nilsson

NSLog(@"%i", [[NSNumber numberWithFloat:((1.2 - 1) * 10)] intValue]); //2
NSLog(@"%i", [[NSNumber numberWithFloat:(((1.2f - 1) * 10))] intValue]); //2 
NSLog(@"%i", [[NSNumber numberWithFloat:1.8] intValue]); //1
NSLog(@"%i", [[NSNumber numberWithFloat:1.8f] intValue]); //1
NSLog(@"%i", [[NSNumber numberWithDouble:2.0000000000001 ] intValue]);//2

answered Sep 29 '22 01:09

Parag Bafna

Related questions
                            
                                What is setTranslatesAutoresizingMaskIntoConstraints:NO for?
                            
                                Can I get the column index in -tableView:objectValueForTableColumn:row:
                            
                                Instagram InstagramCaption not working
                            
                                how to use Objective-C project in my Swift project
                            
                                Objective-C Delegation Explained to a Java Programmer
                            
                                Why do Cocoa apps use so much memory?
                            
                                What kind of logarithm functions / methods are available in objective-c / cocoa-touch?
                            
                                Why does the iPhone SDK use categories, rather than protocols, for some delegates?
                            
                                Currency symbol for users current locale
                            
                                Objective-C Type Inference
                            
                                How to do edit-in-place in a UITableView?
                            
                                How to always visible scroller of Tableview in Obj c?
                            
                                Objective C - KeyValuePair class?
                            
                                Changing first day of the week in NSDateFormatter
                            
                                xcode 4 garbage collection removed?
                            
                                How to rotate an object around a arbitrary point?
                            
                                Should i use Cascade or nullify in Core Data for a relationship?
                            
                                Getting list of class methods
                            
                                autocorrect in UISearchBar interferes when I hit didSelectRowAtIndexPath
                            
                                Can't beginReceivingRemoteControlEvents in iOS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In C and Objective-C, what really is the right way to truncate a float or double to an integer?

Tags:

c

floating-point

floating-accuracy

objective-c