As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Pandas:
import pandas df1 = pandas.DataFrame(index = [1,2,3,4]) df2 = pandas.DataFrame(index = [1,2,3,4]) df1['A'] = [1,2,3,4] df1['B'] = [2,3,4,5] df2['B'] = [2,3,4,5] df2['A'] = [1,2,3,4] df1 == df2
Results in:
Exception: Can only compare identically-labeled DataFrame objects
I believe the expression df1 == df2
should evaluate to a DataFrame containing all True
values. Obviously it's debatable what the correct functionality of ==
should be in this context. My question is: Is there a Pandas method that does what I want? That is, is there a way to do equality comparison that ignores column order?
No, it does not work for missing values. Then you start doing dropna or fillna on various columns that are not matching.
Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.
Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
Answer. Yes, by default, concatenating dataframes will preserve their row order. The order of the dataframes to concatenate will be the order of the result dataframe.
The most common intent is handled like this:
def assertFrameEqual(df1, df2, **kwds ): """ Assert that two dataframes are equal, ignoring ordering of columns""" from pandas.util.testing import assert_frame_equal return assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True, **kwds )
Of course see pandas.util.testing.assert_frame_equal
for other parameters you can pass
You could sort the columns using sort_index
:
df1.sort_index(axis=1) == df2.sort_index(axis=1)
This will evaluate to a dataframe of all True
values.
As @osa comments this fails for NaN's and isn't particularly robust either, in practise using something similar to @quant's answer is probably recommended (Note: we want a bool rather than raise if there's an issue):
def my_equal(df1, df2): from pandas.util.testing import assert_frame_equal try: assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True) return True except (AssertionError, ValueError, TypeError): perhaps something else? return False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With