Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equality in Pandas DataFrames - Column Order Matters?

Tags:

python

pandas

As part of a unit test, I need to test two DataFrames for equality. The order of the columns in the DataFrames is not important to me. However, it seems to matter to Pandas:

import pandas df1 = pandas.DataFrame(index = [1,2,3,4]) df2 = pandas.DataFrame(index = [1,2,3,4]) df1['A'] = [1,2,3,4] df1['B'] = [2,3,4,5] df2['B'] = [2,3,4,5] df2['A'] = [1,2,3,4] df1 == df2 

Results in:

Exception: Can only compare identically-labeled DataFrame objects 

I believe the expression df1 == df2 should evaluate to a DataFrame containing all True values. Obviously it's debatable what the correct functionality of == should be in this context. My question is: Is there a Pandas method that does what I want? That is, is there a way to do equality comparison that ignores column order?

like image 886
jcrudy Avatar asked Jan 08 '13 21:01

jcrudy


People also ask

Does the order of columns matter in pandas?

No, it does not work for missing values. Then you start doing dropna or fillna on various columns that are not matching.

How do I rearrange the order of columns in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

How do you know if two pandas Dataframes are equal?

Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

Does Dataframe preserve order?

Answer. Yes, by default, concatenating dataframes will preserve their row order. The order of the dataframes to concatenate will be the order of the result dataframe.


2 Answers

The most common intent is handled like this:

def assertFrameEqual(df1, df2, **kwds ):     """ Assert that two dataframes are equal, ignoring ordering of columns"""     from pandas.util.testing import assert_frame_equal     return assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True, **kwds ) 

Of course see pandas.util.testing.assert_frame_equal for other parameters you can pass

like image 121
Quant Avatar answered Sep 23 '22 23:09

Quant


You could sort the columns using sort_index:

df1.sort_index(axis=1) == df2.sort_index(axis=1) 

This will evaluate to a dataframe of all True values.


As @osa comments this fails for NaN's and isn't particularly robust either, in practise using something similar to @quant's answer is probably recommended (Note: we want a bool rather than raise if there's an issue):

def my_equal(df1, df2):     from pandas.util.testing import assert_frame_equal     try:         assert_frame_equal(df1.sort_index(axis=1), df2.sort_index(axis=1), check_names=True)         return True     except (AssertionError, ValueError, TypeError):  perhaps something else?         return False 
like image 22
Andy Hayden Avatar answered Sep 25 '22 23:09

Andy Hayden