Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What could cause floating point numbers to suddenly be off by 1 bit without arithmetic changes

In making a somewhat large refactoring change that did not modify any kind of arithmetic, I managed to somehow change the output of my program (an agent based simulation system). Various numbers in the output are now off by miniscule amounts. Examination shows that these numbers are off by 1 bit in their least significant bit.

For example, 24.198110084326416 would become 24.19811008432642. The floating point representation of each number is:

24.198110084326416 = 0 10000000011 1000001100101011011101010111101011010011000010010100
24.19811008432642  = 0 10000000011 1000001100101011011101010111101011010011000010010101

In which we notice that the least significant bit is different.

My question is how I could have introduced this change when I had not modified any type of arithmetic? The change involved simplifying an object by removing inheritance (its super class was bloated with methods that were not applicable to this class).

I note that the output (displaying the values of certain variables at each tick of the simulation) sometimes will be off, then for another tick, the numbers are as expected, only to be off again for the following tick (eg, on one agent, its values exhibit this problem on ticks 57 - 83, but are as expected for ticks 84 and 85, only to be off again for tick 86).

I'm aware that we shouldn't compare floating point numbers directly. These errors were noticed when an integration test that merely compared the output file to an expected output failed. I could (and perhaps should) fix the test to parse the files and compare the parsed doubles with some epsilon, but I'm still curious as to why this issue may have been introduced.

EDIT:

Minimal diff of change that introduced the problem:

diff --git a/src/main/java/modelClasses/GridSquare.java b/src/main/java/modelClasses/GridSquare.java
index 4c10760..80276bd 100644
--- a/src/main/java/modelClasses/GridSquare.java
+++ b/src/main/java/modelClasses/GridSquare.java
@@ -63,7 +63,7 @@ public class GridSquare extends VariableLevel
    public void addHousehold(Household hh)
    {
        assert household == null;
-       subAgents.add(hh);
+       neighborhood.getHouseholdList().add(hh);
        household = hh;
    }

@@ -73,7 +73,7 @@ public class GridSquare extends VariableLevel
    public void removeHousehold()
    {
        assert household != null;
-       subAgents.remove(household);
+       neighborhood.getHouseholdList().remove(household);
        household = null;
    }

diff --git a/src/main/java/modelClasses/Neighborhood.java b/src/main/java/modelClasses/Neighborhood.java
index 834a321..8470035 100644
--- a/src/main/java/modelClasses/Neighborhood.java
+++ b/src/main/java/modelClasses/Neighborhood.java
@@ -166,9 +166,14 @@ public class Neighborhood extends VariableLevel
    World world;

    /**
+    * List of all grid squares within the neighborhood.
+    */
+   ArrayList<VariableLevel> gridSquareList = new ArrayList<>();
+
+   /**
     * A list of empty grid squares within the neighborhood
     */
-   ArrayList<GridSquare> emptyGridSquareList;
+   ArrayList<GridSquare> emptyGridSquareList = new ArrayList<>();

    /**
     * The neighborhood's grid square bounds
@@ -836,7 +841,7 @@ public class Neighborhood extends VariableLevel
     */
    public GridSquare getGridSquare(int i)
    {
-       return (GridSquare) (subAgents.get(i));
+       return (GridSquare) gridSquareList.get(i);
    }

    /**
@@ -865,7 +870,7 @@ public class Neighborhood extends VariableLevel
    @Override
    public ArrayList<VariableLevel> getGridSquareList()
    {
-       return subAgents;
+       return gridSquareList;
    }

    /**
@@ -874,12 +879,7 @@ public class Neighborhood extends VariableLevel
    @Override
    public ArrayList<VariableLevel> getHouseholdList()
    {
-       ArrayList<VariableLevel> list = new ArrayList<VariableLevel>();
-       for (int i = 0; i < subAgents.size(); i++)
-       {
-           list.addAll(subAgents.get(i).getHouseholdList());
-       }
-       return list;
+       return subAgents;
    }

Unfortunately, I'm unable to create a small, compilable example, due to the fact that I am unable to replicate this behavior outside of the program nor cut this very large and entangled program down to size.

As for what kind of floating point operations are being done, there's nothing particularly exciting. A ton of addition, multiplication, natural logarithms, and powers (almost always with base e). The latter two are done with the standard library. Random numbers are used throughout the program, and are generated with Random class included with the framework being used (Repast).

Most numbers are in the range of 1e-3 to 1e5. There's almost no very large or very small numbers. Infinity and NaN is used in many places.

Being an agent based simulation system, many formulas are repetitively applied to simulate emergence. The order of evaluation is very important (as many variables depend on others being evaluated first -- eg, to calculate the BMI, we need the diet and cardio status to be calculated first). The previous values of variables is also very important in many calculations (so this issue could be introduced somewhere early in the program and be carried throughout the rest of it).

like image 627
Kat Avatar asked Jul 30 '14 21:07

Kat


People also ask

Why do computers mess up floating point math?

Because JavaScript uses the IEEE 754 standard for Math, it makes use of 64-bit floating numbers. This causes precision errors when doing floating point (decimal) calculations, in short, due to computers working in Base 2 while decimal is Base 10.

What causes floating point error?

It's a problem caused when the internal representation of floating-point numbers, which uses a fixed number of binary digits to represent a decimal number. It is difficult to represent some decimal number in binary, so in many cases, it leads to small roundoff errors.

Why is floating point arithmetic inaccurate?

Because often-times, they are approximating rationals that cannot be represented finitely in base 2 (the digits repeat), and in general they are approximating real (possibly irrational) numbers which may not be representable in finitely many digits in any base.

Why are floating point calculations inaccurate in Python?

The floating-point calculations are inaccurate because mainly the rationals are approximating that cannot be represented finitely in base 2 and in general they are approximating numbers which may not be representable in finitely many digits in any base.


1 Answers

Here are a few ways in which the evaluation of a floating-point expression can differ:

(1) Floating point processors have a "current rounding mode", which could cause results to differ in the least significant bit. You can make a call which you can Get or Set the current value: round toward zero, toward -∞, or toward +∞.

(2) It sounds like the strictfp is related to the FLT_EVAL_METHOD in C which specifies the precision to be used in intermediate computations. Sometimes a new version of the compiler will use a different method than the old one (I was bitten by that one). {0,1,2} correspond to {single,double,extended} precision respectively unless overriden by higher precision operands.

(3) In the same way that a different compiler can have a different default float evaluation method, different machines can use a different float evaluation method.

(4) Single precision IEEE floating-point arithmetic is well-defined, repeatable, and machine-independent. So is double-precision. I have written (with great care) cross-platform floating-point tests which use an SHA-1 hash to check the computations for bit exactness! However, with FLT_EVAL_METHOD=2, extended precision is used for the intermediate computations, which is variously implemented using 64-bit, 80-bit or 128-bit floating point arithmetic, so it is difficult to get cross-platform and cross-compiler repeatability if extended precision is used in the intermediate computations.

(5) Floating point arithmetic is not associative, i.e.

(A + B) + C ≠ A + (B + C)

Compilers are not allowed to reorder computations of floating-point numbers because of this.

(6) Order of operations matter. An algorithm to compute the sum of a large set of numbers in the greatest possible precision, is to sum them in increasing order of magnitude. On the other hand, if two numbers differ enough in magnitude

B < (A * epsilon)

then summing them is a no-op:

A + B = A
like image 73
Reality Pixels Avatar answered Oct 01 '22 13:10

Reality Pixels