best practices on floating point precision in python

Question

Earlier tonight, a friend of mine just handed me this cute problem. The problem says:

Make a program in MATLAB to check whether a point is inside a triangle or not. Not to forget to check if the point is on the border as well. Triangle points are x=(0,0),y=(0,1)andz=(1,0)

The problem is not hard to solve. The idea is to find the equation of the hypotenuse and check if the point lies on any leg of the triangle. Check for inside and outside turns out not to be that difficult, however.

I made the code on MATLAB, the logic seems to be fine. But the problem is that the result are not in a harmony with that logic! I started questioning my code since I am not such skillful in MATLAB. Nevertheless, I gave it a try on my preferred language, Python.

Here is my code:


def isInsideTriangle(x,y):
    if  x == 0 or y == 0 or y ==  1-x:
        print('on the border of the triangle')
    elif x > 1 or y > 1 or x < 0 or y < 0 or y > 1-x:
            print('outside of the triangle')
            print(1-x)  # check the value
    else:
        # verbose these values to double check
        print(1-x)
        print(y)
        print(type(y))
        print(type(1-x))
        print(y==(1-x))
        print('inside of the triangle')

isInsideTriangle(0.2,0.8)

When trying with this two values, the result on console shall be on the border. However, the program said it is inside! I tried to switch between x and y i.e. isInsideTriangle(0.8,0.2) but the program outputted the expected result this time.

This leaded me to realize that there is no thing to do with the logic but with floating-point precision. I increased the size of the variables on MATLAB to be 64 bit precision and the program works fine.

My question now

As a Python guy, what are the best programming practices to avoid such problems in Python? How can we avoid such annoying problems specially in production environments?

jlandercy · Accepted Answer

Forewords

Your question is a specialization of "Is a point inside a regular polygon?" by regular I mean not self intersecting or multi polygons as we can often found in GIS system. And by extension because your are asking for triangle, I'll assume the polygon is convex.

enter image description here

What is interesting, is that cross product is the key to solve it. When you are dealing with vectors in 2D plane, cross products are orthogonal to this plane. The useful information to extract is: does it point upward or downward?

enter image description here

Float Arithmetic errors will happen, and it becomes critical when the cross product is close to zero, but not equal to, then it will have a sign instead of being null.

To check if your point is inside the polygon, it merely boils down to check if all cross products between edge and the point have the same signs, eg.: sign(h x w) = -1.

In the same way, to check if the polygon is convex is about to check that all cross products of consecutive edges have the same signs, eg.: sign(u x v) = -1.

Implementation

Let's build a small class to check if a point is inside (on the edge or outside) of a regular convex polygon:

import numpy as np
class cpoly:

    def __init__(self, points=[[0,0], [0,1], [1,0]], assert_convexity=True):
        """
        Initialize 2D Polygon with a sequence of 2D points
        """
        self._points = np.array(points)
        assert self.p.shape[0] >= 3
        assert self.p.shape[1] == 2
        assert self.is_convex or not(assert_convexity)

    @property
    def n(self):
        return self.p.shape[0]

    @property
    def p(self):
        return self._points

    @property
    def is_convex(self):
        """
        Check convexity of the polygon (operational for a non intersecting polygon)
        """
        return self.contains()

    def contains(self, p=None, debug=False, atol=2e-16):
        """
        Check if a 2D convex polygon contains a point (also used to assess convexity)
        Returns:
           -1: Point is oustide the polygon
            0: Point is close to polygon edge (epsilon ball)
           +1: Point is inside the polygon
        """
        s = None
        c = False
        n = self.n
        for k in range(n):
            # Vector Differences:
            d1 = self.p[(k+1)%n,:] - self.p[k%n,:]
            if p:
                d2 = p - self.p[k%n,:]
            else:
                d2 = self.p[(k+2)%n,:] - self.p[(k+1)%n,:]
            # Cross Product:
            z = np.cross(d1, d2)
            if np.allclose(z, 0, atol=atol):
                s_ = 0
                c = True
            else:
                s_ = np.sign(z)
            # Debug Helper:
            if debug:
                print("k = %d, d1 = %s, d2 = %s, z = %.32f, s = %d" % (k, d1, d2, z, s_))
            # Check if cross product sign change (excluded null, when point is colinear with the segment)
            if s and (s_ != s) and not(s_ == 0):
                # Nota: Integer are exact if float representable, therefore comparizons are correct
                return -1
            s = s_
        if c:
            return 0
        else:
            return 1

    def plot(self, axe=None):
        import matplotlib.pyplot as plt
        from matplotlib.patches import Polygon
        if not(axe):
            fig, axe = plt.subplots()
            axe.plot(self.p[:,0], self.p[:,1], 'x', markersize=10, label='Points $p_i$')
            axe.add_patch(Polygon(self.p, alpha=0.4, label='Area'))
            axe.set_xlabel("$x$")
            axe.set_ylabel("$y$")
            axe.set_title("Polygon")
            axe.set_aspect('equal')
            axe.legend(bbox_to_anchor=(1,1), loc='upper left')
            axe.grid()
        return axe.get_figure(), axe

The class is initialized with a list of 2D points (by default it is yours).

p = cpoly()

On my setup, float precision is about:

e = np.finfo(np.double).eps # 2.220446049250313e-16

We create a trial dataset for testing purposes:

p = cpoly()
r = [
    [0,0], [0,1], [1,0], # Polygon vertices
    [0,0.5], [-e,0.6], [e,0.4], [0.1, 0.1], [1,1],
    [0.5+e,0.5], [0.3-e,0.7], [0.7+e/10,0.3],
    [0, 1.2], [1.2, 0.], # Those points make your logic fails
    [0.2,0.8], [0.1,0.9],
    [0.8+10*e,0.2], [0.9+10*e,0.1]
]

If we adapt your function to have compliant output:

def isInsideTriangle(x,y):
    if  x == 0 or y == 0 or y ==  1-x:
        return 0
    elif x > 1 or y > 1 or x < 0 or y < 0 or y > 1-x:
        return -1
    else:
        return 1

Then we check trial points to see if both our functions behave well:

_, axe = p.plot()
cols = {-1: 'red', 0:'orange', 1:'green'}
for x in r:
    q1 = p.contains(x)
    q2 = isInsideTriangle(*x)
    print(q1==q2)
    axe.plot(*x, 'o', markersize=4, color=cols[q1])

enter image description here

With this setup, all points are correctly classified. But you can see that your algorithm is flawed. Mainly the following line:

if  x == 0 or y == 0 or y ==  1-x:

Fails to reject [0, 1.2] and [1.2, 0.].

Points close to an edge and Float Arithmetic Error

Even for point exactly on the edge like [0.2,0.8] floating point error leads to point misclassification. The following point would make the cross product not exactly equal to zero as it should. See details bellow:

p.contains([0.2,0.8], debug=True) # True
# k = 0, d1 = [0 1], d2 = [0.2 0.8], z = -0.20000000000000001110223024625157, s = -1
# k = 1, d1 = [ 1 -1], d2 = [ 0.2 -0.2], z = 0.00000000000000005551115123125783, s = 0
# k = 2, d1 = [-1  0], d2 = [-0.8  0.8], z = -0.80000000000000004440892098500626, s = -1

This is why we must add a ball of radius atol to check if is actually zero for the given tolerance:

if np.allclose(z, 0, atol=atol):
    s_ = 0
    # ...
else:
    s_ = np.sign(z)

It means we must accept that points close enough to the edge (from both side) are considered to be contained into the polygon. This is inherent from Float Arithmetic, the best you can do is to tune atol to an acceptable value for your application. Or you can find out another logic or data model which does not suffer this problem.

Zooming to precision scale

If we zoom to precision scale to see what happens close to the edge, we got:

n = 40
lim = np.array([0.5,0.5]) + [-n*e/2, +n*e/2]
x = np.linspace(lim[0], lim[1], 30)
X, Y = np.meshgrid(x, x)
x, y = X.reshape(-1), Y.reshape(-1)

_, axe = p.plot()
axe.set_xlim(lim)
axe.set_ylim(lim)
for r in zip(x,y):
    q = p.contains(r)
    axe.plot(*r, 'o', color=cols[q], markersize=2)

enter image description here

We see that some points very close to the edge, but inside or outside of the polygon, are classified as "on the edge". This is because of the epsilon ball criterion. You can also observe that points are not equally spaced (no matter if I used linspace) because of it is impossible to express 10 as an integral power of 2.

Conclusion

The solution above is a generalization of your problem, performing in O(n). It might seems to be overkill but it is general (it works for any regular polygon) and comprehensive (it relies on a well known geometric concept).

Actually, the algorithm simply check if the point stays at the same side of all edges of the polygon when walking through the path. If it does, then it concludes the point is inside the polygon, that is!

The above solution is off course affected by float arithmetic error because it relies on float computation (see point 10). Fortunately, using epsilon ball test, we can mitigate it.

Reference

If you want to deeper understand finite precision arithmetic, I would advise you to read the excellent book: Accuracy and Stability of Numerical Algorithms, J. Higham.

Bonus

Bellow a comparison of all answers with the trial dataset:

enter image description here

We can give some context on the different kind of "errors" underlined by this soft check:

Points 11 and 12 are misclassified by @MagedSaeed because of the design flaw;
@MahmoudElshahat use straight float equality in its logic, this is why it returns different results for equivalent class of points (eg.: points 0 to 2, polygon points);
@SamMason uses the simplest logic with an epsilon ball test. It seems to have a biggest error rate because it does not discriminate between inside and on the border (we can ignore all points where exact answer is 0 and its function returned 1). IMO, this it is the best if-then algorithm presented which runs in O(1). Additionally it can easily be updated to take into account the 3 states logic;
Point 10 was deliberately designed to challenge logic at the border, the point is clearly outside the polygon from distance with a magnitude about the machine precision. This is where float arithmetic error becomes significant and the logic is more fuzzy: how far from the edge is acceptable and tractable?

Admin · Answer

First of all, your logic is incorrect. Consider the case of x=0.9, y=0.9. This is clearly outside the triangle, but does not satisfy any of the conditions x > 1 or y > 1 or x < 0 or y < 0.

Second of all, any floating-point arithmetic which involves an equality comparison -- like testing if a point is "on the border" of a shape -- is likely to be affected by precision problems. Reworking your logic to instead test whether a point is within a small margin of the border is likely to work better.

I would recommend against the use of the Decimal class for anything that isn't natively a decimal number, like currency. Performing anything other than basic arithmetic on a Decimal (such as math.sqrt) will internally convert it to a float anyway.

best practices on floating point precision in python

Tags:

python

floating-point

double

precision

Maged Saeed

2 Answers

Forewords

Implementation

Points close to an edge and Float Arithmetic Error

Zooming to precision scale

Conclusion

Reference

Bonus

jlandercy

Recent Activity

Donate For Us

best practices on floating point precision in python

Tags:

python

floating-point

double

precision

Maged Saeed

2 Answers

Forewords

Implementation

Points close to an edge and Float Arithmetic Error

Zooming to precision scale

Conclusion

Reference

Bonus

jlandercy

Related questions

Recent Activity

Donate For Us