Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas assert_frame_equal behavior

I am attempting to compare two DataFrames with pandas testing assert_frame_equal. These frames contain floats that I want to compare to some user defined precision.

The check_less_precise argument from assert_frame_equal seems to suggest that I can specify the number of digits after the decimal point to compare. To quote the API Reference page -

check_less_precise: Specify comparison precision. Only used when check_exact is False. 5 digits (False) or 3 digits (True) after decimal points are compared. If int, then specify the digits to compare

API Reference

However, This doesn't seem to work when the floats are less than 1.

This raises an AssertionError

import pandas as pd

expected = pd.DataFrame([{"col": 0.1}])
output = pd.DataFrame([{"col": 0.12}])
pd.testing.assert_frame_equal(expected, output, check_less_precise=1)

while this does not

expected = pd.DataFrame([{"col": 1.1}])
output = pd.DataFrame([{"col": 1.12}])
pd.testing.assert_frame_equal(expected, output, check_less_precise=1)

can someone help explain this behavior, is this a bug?

like image 729
RoachLord Avatar asked Sep 19 '17 15:09

RoachLord


1 Answers

check_less_precise works more like relative tolerance. See details below.

I dug through the source code and found out what is happening. Eventually the function decimal_almost_equal gets called which looks like this in normal Python (its in Cython).

def decimal_almost_equal(desired, actual, decimal):
    return abs(desired - actual) < (0.5 * 10.0 ** -decimal)

See the source code here Here is actual call to the function:

decimal_almost_equal(1, fb / fa, decimal)

Where in this example

fa = .1
fb = .12
decimal = 1

So the function call becomes

decimal_almost_equal(1, 1.2, 1)

Which decimal_almost_equal evaluates as

abs(1 - 1.2) < .5  * 10 ** -1

Or

.2 < .05

Which is False.

So the comparison is based on percentage difference and not total difference it seems.

If you want an absolute comparison, check out np.allclose.

np.allclose(expected, output, atol=.1)
True
like image 164
Ted Petrou Avatar answered Nov 19 '22 18:11

Ted Petrou