Floating point calculation gives different results with float than with double

Question

I have the following line of code.

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()));

void onBeingHit(int decHP) method accepts integer number and updates health points.
float getDefensePercent() method is a getter method returning the defense percent of a hero.
ENEMY_ATTACK_POINT is a macro constant factor defined as #define ENEMY_ATTACK_POINT 20.

Let's say hero->getDefensePercent() returns 0.1. So the calculation is

20 * (1.0 - 0.1)  =  20 * (0.9)  =  18

Whenever I tried it with the following code (no f appending 1.0)

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()));

I got 17.

But for the following code (f appended after 1.0)

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0f - hero->getDefensePercent()));

I got 18.

What's going on? Is f significant to have at all although hero->getDefensePercent() is already in float?

leemes · Accepted Answer

What's going on? Why isn't the integer result 18 in both cases?

The problem is that the result of the floating point expression is rounded towards zero when being converted to an integer value (in both cases).

0.1 can't be represented exactly as a floating point value (in both cases). The compiler does the conversion to a binary IEEE754 floating point number and decides whether to round up or down to a representable value. The processor then multiplies this value during runtime and the result is rounded to get an integer value.

Ok, but since both double and float behave like that, why do I get 18 in one of the two cases, but 17 in the other case? I'm confused.

Your code takes the result of the function, 0.1f (a float), and then calculates 20 * (1.0 - 0.1f) which is a double expression, while 20 * (1.0f - 0.1f) is a float expression. Now the float version happens to be slightly larger than 18.0 and gets rounded down to 18, while the double expression is slightly less than 18.0 and gets rounded down to 17.

If you don't know exactly how IEEE754 binary floating point numbers are constructed from decimal numbers, it's pretty much random if it will be slightly less or slightly greater than the decimal number you've entered in your code. So you shouldn't count on this. Don't try to fix such an issue by appending f to one of the numbers and say "now it works, so I leave this f there", because another value behaves differently again.

Why depends the type of the expression on the precence of this f?

This is because a floating point literal in C and C++ is of type double per default. If you add the f, it's a float. The result of a floating point epxression is of the "greater" type. The result of a double expression and an integer is still a double expression as well as int and float will be a float. So the result of your expression is either a float or a double.

Ok, but I don't want to round to zero. I want to round to the nearest number.

To fix this issue, add one half to the result before converting it to an integer:

hero->onBeingHit(ENEMY_ATTACK_POINT * (1.0 - hero->getDefensePercent()) + 0.5);

In C++11, there is std::round() for that. In previous versions of the standard, there was no such function to round to the nearest integer. (Please see comments for details.)

If you don't have std::round, you can write it yourself. Take care when dealing with negative numbers. When converting to an integer, the number will be truncated (rounded towards zero), which means that negative values will be rounded up, not down. So we have to subtract one half if the number is negative:

int round(double x) {
    return (x < 0.0) ? (x - .5) : (x + .5);
}

Floating point calculation gives different results with float than with double

Tags:

c++

floating-point

floating-accuracy

double

haxpor

1 Answers

leemes

Recent Activity

Donate For Us

Floating point calculation gives different results with float than with double

Tags:

c++

floating-point

floating-accuracy

double

haxpor

1 Answers

leemes

Related questions

Recent Activity

Donate For Us