Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does floor() return something that's exactly representable?

In C89, floor() returns a double. Is the following guaranteed to work?

double d = floor(3.0 + 0.5);
int x = (int) d;
assert(x == 3);

My concern is that the result of floor might not be exactly representable in IEEE 754. So d gets something like 2.99999, and x ends up being 2.

For the answer to this question to be yes, all integers within the range of an int have to be exactly representable as doubles, and floor must always return that exactly represented value.

like image 573
Jim Hunziker Avatar asked Jan 13 '09 18:01

Jim Hunziker


3 Answers

All integers can have exact floating point representation if your floating point type supports the required mantissa bits. Since double uses 53 bits for mantissa, it can store all 32-bit ints exactly. After all, you could just set the value as mantissa with zero exponent.

like image 162
mmx Avatar answered Nov 12 '22 01:11

mmx


If the result of floor() isn't exactly representable, what do you expect the value of d to be? Surely if you've got the representation of a floating point number in a variable, then by definition it's exactly representable isn't it? You've got the representation in d...

(In addition, Mehrdad's answer is correct for 32 bit ints. In a compiler with a 64 bit double and a 64 bit int, you've got more problems of course...)

EDIT: Perhaps you meant "the theoretical result of floor(), i.e. the largest integer value less than or equal to the argument, may not be representable as an int". That's certainly true. Simple way of showing this for a system where int is 32 bits:

int max = 0x7fffffff;
double number = max;
number += 10.0;
double f = floor(number);
int oops = (int) f;

I can't remember offhand what C does when conversions from floating point to integer overflow... but it's going to happen here.

EDIT: There are other interesting situations to consider too. Here's some C# code and results - I'd imagine at least similar things would happen in C. In C#, double is defined to be 64 bits and so is long.

using System;
class Test
{
    static void Main()
    {
        FloorSameInteger(long.MaxValue/2);
        FloorSameInteger(long.MaxValue-2);
    }

    static void FloorSameInteger(long original)
    {
        double convertedToDouble = original;
        double flooredToDouble = Math.Floor(convertedToDouble);
        long flooredToLong = (long) flooredToDouble;

        Console.WriteLine("Original value: {0}", original);
        Console.WriteLine("Converted to double: {0}", convertedToDouble);
        Console.WriteLine("Floored (as double): {0}", flooredToDouble);
        Console.WriteLine("Converted back to long: {0}", flooredToLong);
        Console.WriteLine();
    }
}

Results:

Original value: 4611686018427387903
Converted to double: 4.61168601842739E+18
Floored (as double): 4.61168601842739E+18
Converted back to long: 4611686018427387904

Original value: 9223372036854775805
Converted to double: 9.22337203685478E+18
Floored (as double): 9.22337203685478E+18
Converted back to long: -9223372036854775808

In other words:

(long) floor((double) original)

isn't always the same as original. This shouldn't come as any surprise - there are more long values than doubles (given the NaN values) and plenty of doubles aren't integers, so we can't expect every long to be exactly representable. However, all 32 bit integers are representable as doubles.

like image 43
Jon Skeet Avatar answered Nov 11 '22 23:11

Jon Skeet


I think you're a bit confused about what you want to ask. floor(3 + 0.5) is not a very good example, because 3, 0.5, and their sum are all exactly representable in any real-world floating point format. floor(0.1 + 0.9) would be a better example, and the real question here is not whether the result of floor is exactly representable, but whether inexactness of the numbers prior to calling floor will result in a return value different from what you would expect, had all numbers been exact. In this case, I believe the answer is yes, but it depends a lot on your particular numbers.

I invite others to criticize this approach if it's bad, but one possible workaround might be to multiply your number by (1.0+0x1p-52) or something similar prior to calling floor (perhaps using nextafter would be better). This could compensate for cases where an error in the last binary place of the number causes it to fall just below rather than exactly on an integer value, but it will not account for errors which have accumulated over a number of operations. If you need that level of numeric stability/exactness, you need to either do some deep analysis or use an arbitrary-precision or exact-math library which can handle your numbers correctly.

like image 2
R.. GitHub STOP HELPING ICE Avatar answered Nov 11 '22 23:11

R.. GitHub STOP HELPING ICE