Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two pandas series for floating point near-equality?

I can compare two Pandas series for exact equality using pandas.Series.equals. Is there a corresponding function or parameter that will check if the elements are equal to some ε of precision?

like image 861
Mark Harrison Avatar asked Oct 03 '17 22:10

Mark Harrison


People also ask

Can we use == to compare two float values in Python?

In the case of floating-point numbers, the relational operator (==) does not produce correct output, this is due to the internal precision errors in rounding up floating-point numbers. In the above example, we can see the inaccuracy in comparing two floating-point numbers using “==” operator.

How do you compare two pandas Series?

In the pandas series constructor, there is a method called gt() which is used to apply the Greater Than condition between elements of two pandas series objects. The result of the gt() method is based on the comparison between elements of two series objects.

How do you compare two floating-point numbers in Python?

How To Compare Floats in Python. If abs(a - b) is smaller than some percentage of the larger of a or b , then a is considered sufficiently close to b to be "equal" to b . This percentage is called the relative tolerance. You can specify the relative tolerance with the rel_tol keyword argument of math.

How do you compare floating-point numbers?

Relative Comparison of Floating-point ValuesIf a and b differ in sign then returns the largest representable value for T. If both a and b are both infinities (of the same sign), then returns zero. If just one of a and b is an infinity, then returns the largest representable value for T.


2 Answers

You can use numpy.allclose:

numpy.allclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

Returns True if two arrays are element-wise equal within a tolerance.

The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.

numpy works well with pandas.Series objects, so if you have two of them - s1 and s2, you can simply do:

np.allclose(s1, s2, atol=...) 

Where atol is your tolerance value.

like image 161
cs95 Avatar answered Oct 10 '22 10:10

cs95


Numpy works well with pandas Series. However one has to be careful with the order of indices (or columns and indices for pandas DataFrame)

For example

series_1 = pd.Series(data=[0,1], index=['a','b'])
series_2 = pd.Series(data=[1,0], index=['b','a']) 
np.allclose(series_1,series_2)

will return False

A workaround is to use the index of one pandas series

np.allclose(series_1, series_2.loc[series_1.index])
like image 36
bolirev Avatar answered Oct 10 '22 11:10

bolirev