------ Please jump to the last update -----------
I have found a bug (in my code) and I am struggling to find correct understanding of it.
It all boils down to this specific example taken from the immediate window while debugging:
x
0.00023569075
dx
-0.000235702712
x+dx+1f < 1f
true
(float) (x+dx+1f) < 1f
false
x and dx are both of type float. So why is the boolean value different when I do a cast?
In the actual code i had:
x+=dx
if( x+1f < 1f) // Add a one to truncate really small negative values (originally testing x < 0)
{
// do actions accordingly
// Later doing
x+=1; // Where x<1 has to be true, therefore we have to get rid of really small negatives where x+=1 will give x==1 as true and x<1 false.
}
but I am now trying with the cast
x+=dx;
if( (float)( x+1f) < 1f) // Add a one to truncate really small negative values (originally testing x < 0)
{
// do actions accordingly
// Later doing
x+=1; // Where x<1 has to be true, therefore we have to get rid of really small negatives where x+=1 will give x==1 as true and x<1 false.
}
Visual studio says that the cast is redundant but it DO get a false positve without it as de immediate window also told me when:
x+dx+1f < 1f
true
I am currently running my code to see if I get the bug again with the cast and I will update as soon as I get convinced either way.
In the meanwhile I hope someone can sort out whats going on here? Can I expect the Cast to do something?
Update - Variables My variables x and dx are components of a Vector2 (Xna/monogame). So in the code you should read
Vector2 coord; // the x (and y) components are floats.
Vector2 ds;
coord.X // where it says x
ds.X // where it says dx
I thought this would not matter, but maybe it does.
Update 2 - Drop the above example
Seeing that the cast did change the outcome I made this simple demonstration
class Program
{
static void Main(string[] args)
{
float a = -2.98023224E-08f; // Just a small negative number i picked...
Console.WriteLine(((a + 1f) < 1f) ? "true" : "false"); //true
Console.WriteLine(((float)(a + 1f) < 1f) ? "true":"false"); //false
// Visual Studio Community 2015 marks the above cast as redundant
// but its clearly something fishy going on!
Console.Read();
}
}
So, why does this cast change the result when even VS says it is redundant?
I don't see how you're declaring your variables, but assigning the static values that posted to variables makes those variables of double
type, not float
. And as you know, the double
type has larger precision than float
.
Here is a test:
var x = 0.00023569075;
var dx = -0.000235702712;
Console.WriteLine(x.GetType()); //result: System.Double
Console.WriteLine(dx.GetType()); //result: System.Double
And of course, when adding two double
s and a float
, the result is double
, so that's why the first condition returns true
:
Console.WriteLine(x+dx+1f < 1f); //returns true
Console.WriteLine(x+dx+1f); //returns 0.999999988038
But when you cast it to float
, a truncation occurs and the result is no longer correct, which is why your second condition returns false
:
Console.WriteLine((float)(x+dx+1f) < 1f); //returns false
Console.WriteLine((float)(x+dx+1f)); //returns 1
UPDATE: When your variables are float
, truncation is at play here. Remember that the max precision of float
is only 7 digits and you're assigning numbers with much more digits, so truncation occurs and results in the inaccurate results that you're witnessing.
In your original question, here is how the values are truncated:
float x = 0.00023569075f;
float dx = -0.000235702712f;
Console.WriteLine(x); //0.0002356907 last digit lost
Console.WriteLine(dx); //-0.0002357027 last two digits lost
Console.WriteLine((x + dx)); //-1.196167E-08
Console.WriteLine((x + dx + 1f)); //1
The reason why the last result is 1
should be obvious. The result of adding x
and dx
is -1.196167E-08
(-0.00000001196167
) which has 7 digits and can fit in float
. Now adding 1
makes it 0.99999998803833
which has 14 digits and cannot fit in float
so it is truncated and rounded to 1
when stored in a float
.
The same thing happens in your update 2. The value -2.98023224E-08f
has 9 digits, so it is truncated to -2.980232E-08
(-0.00000002980232
). Again, adding 1
to that makes it 0.99999997019768
which is truncated and rounded to 1
:
float a = -2.98023224E-08f;
Console.WriteLine(a); //-2.980232E-08 last two digits lost
Console.WriteLine(a + 1f); //1
UPDATE 2 Chris commented about the calculation being done at a higher precision which is absolutely correct, but that doesn't explain the results which should not be affected by that. Yes a + 1f
calculation is done at a higher precision, but because both operands are float
, the result of the calculation is then automatically casted down to float
. Manually casting the result to float
should be then redundant and shouldn't change the result. More importantly, it does not force the calculation to be done at float
precision. Yes, we still get these results:
Console.WriteLine(a + 1f); //1
Console.WriteLine(a + 1f < 1f); //True
Console.WriteLine((float)(a + 1f) < 1f); //False
Thanks to a good debate with Chris and lots of testing on various machines, I think I have a better understanding of what's going on.
When we read:
Floating-point operations may be performed with higher precision than the result type of the operation
The word operations here is not only the calculations (addition, in our example), but also the comparisons (less than, in our example). So in the second line above, the entire a + 1f < 1f
is done at a higher precision: Adding the value -2.98023224E-08f
(-0.0000000298023224
) to 1
results in 0.9999999701976776
which is then compared to 1f
and obviously return true
:
Console.WriteLine(a + 1f < 1f); //True
At no time there is any casting to float
, because the result of the comparison is bool
.
In the first line however, we're simply printing the result of the calculation a+1f
, and because both operands are float
, the result is automatically casted down to float
and that causes it to be truncated and rounded to 1
:
Console.WriteLine(a + 1f); //1
Now the big question is about the third line. What's different this time is that the cast is forcing the result of the calculation to be casted down to float, which truncates and rounds it to 1
and then this is compared to 1f
. The comparison is still done at a higher precision, but now it doesn't matter because the casting has already changed the result of the calculation:
Console.WriteLine((float)(a + 1f) < 1f); //False
So the casting here is causing the two operations (addition and comparison) to be done separately. Without casting, the steps are: add, compare, print. With casting, the steps are: add, cast, compare, print. Both operations are still done at a higher precision, because casting cannot affect that.
Perhaps Visual Studio is saying that the casting is redundant because it is not taking into account whether the operations will be done at a higher precision or not.
I think the important part of the c# spec here is this:
"Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. " - https://msdn.microsoft.com/en-us/library/aa691146(v=vs.71).aspx
We can infer that this is almost certainly what is happening by looking at these three lines of code, doing the comparison in slightly different ways:
float a = -2.98023224E-08f;
Console.WriteLine((a + 1f) < 1f); // True
Console.WriteLine((float)(a + 1f) < 1f); //False
Console.WriteLine((double)(a + 1f) < 1f); //True
As you can see, the results of the first calculation (which is what we are wondering about) is the same as if the intermediate value is cast as a double telling us that the compiler is taking advantage of the option of performing the calculations at a higher precision.
Of course the reason the results are different is because although we can see that the result should be true when it calculates a+1f
the result as a single is 1, hence the comparison being false.
And just to round this off a
in the above is stored in a float with an exponent of -25 and a fraction of 0. If you add 1 to this then the -25 exponent parts are too small to be represented so it needs to round and in this case the rounding leaves the number at 1. This is because the way single precision floating point numbers are stored they only have 23 bits for the part following the leading 1 and therefore doesn't have the precision to store the fraction and it ends up rounding it to exactly 1 when stored. Hence why the comparison returns false when we force it to use float calculations all the way.
Because floats are stored in BINARY, IEEE floating point standard represents numbers as a binary mantissa and a binary exponent. (powers of 2). many decimal numbers cannot be represented exactly in this representation. so the compiler uses the nearest available binary IEEE floating point number that is available.
So since it is not exactly correct, no matter how small the difference actually is, the comparison fails. calculate the difference and you will see how small it is.
var diff = (float)(x+dx+1f) - 1f;
If you use decimals, it would probably work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With