Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare content of two pandas dataframes even if the rows are differently ordered

I have two pandas dataframes, which rows are in different orders but contain the same columns. My goal is to easily compare the two dataframes and confirm that they both contain the same rows.

I have tried the "equals" function, but there seems to be something I am missing, because the results are not as expected:

df_1 = pd.DataFrame({1: [10,15,30], 2: [20,25,40]})
df_2 = pd.DataFrame({1: [30,10,15], 2: [40,20,25]})
df_1.equals(df_2)

I would expect that the outcome returns True, because both dataframes contain the same rows, just in a different order, but it returns False.

like image 560
jotNewie Avatar asked Mar 26 '19 13:03

jotNewie


2 Answers

You can specify columns for sorting in DataFrame.sort_values - in my solution sorting by all columns and DataFrame.reset_index with drop=True for default indices in both DataFrames:

df11 = df_1.sort_values(by=df_1.columns.tolist()).reset_index(drop=True)
df21 = df_2.sort_values(by=df_2.columns.tolist()).reset_index(drop=True)
print (df11.equals(df21))
True
like image 75
jezrael Avatar answered Oct 06 '22 22:10

jezrael


Try sorting and reseting index

df_1.sort_values(by=[1,2]).equals(df_2.sort_values(by=[1,2]).reset_index(drop=True))
like image 40
rafaelc Avatar answered Oct 06 '22 21:10

rafaelc