Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puzzled by different result from "same" type cast, float to int

If I assign a value to a floating point computation to a variable first, then assign that to an unsigned int with implicit type casting, I get one answer. But if I assign the same computation directly to the unsigned int, again with implicit type casting, I get a different answer.

Below is sample code I compiled and ran to demonstrate:

#include <iostream>



int
main( int argc, char** argv )
{
    float   payloadInTons = 6550.3;


    //  Above, payloadInTons is given a value.
    //  Below, two different ways are used to type cast that same value,
    //  but the results do not match.
    float tempVal = payloadInTons * 10.0;
    unsigned int right = tempVal;
    std::cout << "    right = " << right << std::endl;


    unsigned int rawPayloadN = payloadInTons * 10.0;
    std::cout << "    wrong = " << rawPayloadN << std::endl;


    return 0;
}

Does anyone have insight into why "right" is right, and "wrong" is wrong?

By the way, I am using gcc 4.8.2 on Ubuntu 14.04 LTS, if it matters.

like image 423
donjuedo Avatar asked Apr 21 '15 15:04

donjuedo


2 Answers

You are using double literals. With proper float literals, everything's fine.

int
main( int argc, char** argv )
{
    float   payloadInTons = 6550.3f;
    float tempVal = payloadInTons * 10.0f;

    unsigned int right = tempVal;
    std::cout << "     right = " << right << std::endl;

    unsigned int rawPayloadN = payloadInTons * 10.0f;
    std::cout << "also right = " << rawPayloadN << std::endl;


    return 0;
}

Output :

     right = 65503
also right = 65503
like image 155
Quentin Avatar answered Oct 19 '22 15:10

Quentin


After accept answer

This is not a double vs. float issue. It is a binary floating-point and conversion to int/unsigned issue.

Typical float uses binary32 representation with does not give exact representation of values like 6550.3.

float payloadInTons = 6550.3;
// payloadInTons has the exact value of `6550.2998046875`.

Multiplying by 10.0, below, insures the calculation is done with at least double precision with an exact result of 65502.998046875. The product is then converted back to float. The double value is not exactly representable in float and so is rounded to the best float with an exact value of 65503.0. Then tempVal converts right as desired with a value of 65503.

float tempVal = payloadInTons * 10.0;
unsigned int right = tempVal;

Multiplying by 10.0, below, insures the calculation is done with at least double precision with an exact result of 65502.998046875 just as before. This time, the value is converted directly to unsigned rawPayloadN with the undesired with a value of 65502. This is because the value in truncated and not rounded.

unsigned int rawPayloadN = payloadInTons * 10.0;

The first “worked” because of the conversion was double to float to unsigned. This involves 2 conversions with is usually bad. In this case, 2 wrongs made a right.


Solution

Had code tried float payloadInTons = 6550.29931640625; (the next smallest float number) both result would have been 65502.

The "right” way to convert a floating point value to some integer type is often to round the result and then perform the type conversion.

float tempVal = payloadInTons * 10.0;
unsigned int right = roundf(tempVal);

Note: This entire issue is complication by the value of FLT_EVAL_METHOD. If user’s value is non-zero, floating point calculation may occur at higher precision than expected.

printf("FLT_EVAL_METHOD %d\n", (int) FLT_EVAL_METHOD);
like image 38
chux - Reinstate Monica Avatar answered Oct 19 '22 16:10

chux - Reinstate Monica