I have two pandas dataframes, which rows are in different orders but contain the same columns. My goal is to easily compare the two dataframes and confirm that they both contain the same rows.
I have tried the "equals" function, but there seems to be something I am missing, because the results are not as expected:
df_1 = pd.DataFrame({1: [10,15,30], 2: [20,25,40]})
df_2 = pd.DataFrame({1: [30,10,15], 2: [40,20,25]})
df_1.equals(df_2)
I would expect that the outcome returns True, because both dataframes contain the same rows, just in a different order, but it returns False.
You can specify columns for sorting in DataFrame.sort_values
- in my solution sorting by all columns and DataFrame.reset_index
with drop=True
for default indices in both DataFrames
:
df11 = df_1.sort_values(by=df_1.columns.tolist()).reset_index(drop=True)
df21 = df_2.sort_values(by=df_2.columns.tolist()).reset_index(drop=True)
print (df11.equals(df21))
True
Try sorting and reseting index
df_1.sort_values(by=[1,2]).equals(df_2.sort_values(by=[1,2]).reset_index(drop=True))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With