Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confirming equality of two pandas dataframes?

Tags:

python

pandas

How to assert that the following two dataframes df1 and df2 are equal?

import pandas as pd
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1.0, 2, 3])

The output of df1.equals(df2) is False. As of now, I know two ways:

print (df1 == df2).all()[0]

or

df1 = df1.astype(float)
print df1.equals(df2)

It seems a little bit messy. Is there a better way to do this comparison?

like image 691
Mehdi Jafarnia Jahromi Avatar asked Jul 05 '16 21:07

Mehdi Jafarnia Jahromi


People also ask

How do you know if two Pandas DataFrames are equal?

Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare two Pandas series?

Step 1: Define two Pandas series, s1 and s2. Step 2: Compare the series using compare() function in the Pandas series. Step 3: Print their difference.

How can you tell if two DataFrames have the same index?

equals() function determine if two Index objects contains the same elements. If they contain the same elements then the function returns True else the function returns False indicating the values contained in both the Indexes are different.


2 Answers

You can use assert_frame_equal and not check the dtype of the columns.

# Pre v. 0.20.3 # from pandas.util.testing import assert_frame_equal  from pandas.testing import assert_frame_equal  assert_frame_equal(df1, df2, check_dtype=False) 
like image 148
Alexander Avatar answered Sep 20 '22 18:09

Alexander


Using elegant @Divakar's idea - numpy's allclose() will do the main trick for numbers:

In [128]: df1
Out[128]:
   0    s  n
0  1  aaa  1
1  2  aaa  2
2  3  aaa  3

In [129]: df2
Out[129]:
     0    s    n
0  1.0  aaa  1.0
1  2.0  aaa  2.0
2  3.0  aaa  3.0

In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
   .....:  &
   .....:  df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
   .....: )
Out[130]: True

select_dtypes() will help you to separate strings and all other numeric dtypes

like image 35
MaxU - stop WAR against UA Avatar answered Sep 19 '22 18:09

MaxU - stop WAR against UA