Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Integer Conversion in Floating Point Arithmetic

Tags:

c++

c

I currently face the following dilemma:

1.0f * INT32_MAX != INT32_MAX

Evaluating 1.0f * INT32_MAX actually gives me INT32_MIN

I'm not completely surprised by this, I know floating point to integer conversions aren't always exact.

What is the best way to fix this problem?

The code I'm writing is scaling an array of rational numbers: from -1.0f <= x <= 1.0f to INT32_MIN <= x <= INT32_MAX

Here's what the code looks like:

void convert(int32_t * dst, const float * src, size_t count){
    size_t i = 0;
    for (i = 0; i < count; i++){
        dst[i] = src[i] * INT32_MAX;
    }
}

Here's what I ended up with:

void convert(int32_t * dst, const float * src, size_t count){
    size_t i = 0;
    for (i = 0; i < count; i++){
        double tmp = src[i];
        if (src[i] > 0.0f){
            tmp *= INT32_MAX;
        } else {
            tmp *= INT32_MIN;
            tmp *= -1.0;
        }
        dst[i] = tmp;
    }
}
like image 770
tay10r Avatar asked Jan 02 '16 14:01

tay10r


People also ask

How can you convert an integer to a floating point number?

Integers and floats are data types that deal with numbers. To convert the integer to float, use the float() function in Python. Similarly, if you want to convert a float to an integer, you can use the int() function.

What is integer and floating point arithmetic?

Integers and floats are two different kinds of numerical data. An integer (more commonly called an int) is a number without a decimal point. A float is a floating-point number, which means it is a number that has a decimal place. Floats are used when more precision is needed.

Can integer be converted to float?

To convert an integer data type to float you can wrap the integer with float64() or float32.

How do you perform arithmetic with floating-point numbers?

Arithmetic operations on floating point numbers consist of addition, subtraction, multiplication and division. The operations are done with algorithms similar to those used on sign magnitude integers (because of the similarity of representation) — example, only add numbers of the same sign.


1 Answers

In IEEE754, 2147483647 is not representable in a single precision float. A quick test shows that the result of 1.0f * INT32_MAX is rounded to 2147483648.0f, which can't be represented in an int.

In other words, it is actually the conversion to int that causes the problem, not the float calculation, which happens to be only 1 off!

Anyway, the solution is to use double for the intermediate calculation. 2147483647.0 is OK as a double precision number.

like image 156
Mr Lister Avatar answered Sep 28 '22 08:09

Mr Lister