Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas "Can only compare identically-labeled DataFrame objects" error

Tags:

python

pandas

I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): ...

uat = uat[['Customer Number','Product']] prod = prod[['Customer Number','Product']] print uat['Customer Number'] == prod['Customer Number'] print uat['Product'] == prod['Product'] print uat == prod  The first two match exactly: 74357    True 74356    True Name: Customer Number, dtype: bool 74357    True 74356    True Name: Product, dtype: bool 

For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?

Thanks

like image 554
user1804633 Avatar asked Aug 31 '13 12:08

user1804633


People also ask

How do you fix can only compare identically-labeled Series objects?

If you try to compare DataFrames with different indexes using the equality comparison operator == , you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==. For example, df1. equals(df2) , which ignores the indexes.

Can only compare identically-labeled Series objects meaning?

Can only compare identically-labeled series objects: It is Value Error, occurred when we compare 2 different DataFrames (Pandas 2-D Data Structure). If we compare DataFrames which are having different labels or indexes then this error can be thrown.

How do you know if two pandas DataFrames are equal?

Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do I compare two data frames in Python?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


2 Answers

Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])  In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])  In [3]: df1 == df2 Exception: Can only compare identically-labeled DataFrame objects 

One solution is to sort the index first (Note: some functions require sorted indexes):

In [4]: df2.sort_index(inplace=True)  In [5]: df1 == df2 Out[5]:        0     1 0  True  True 1  True  True 

Note: == is also sensitive to the order of columns, so you may have to use sort_index(axis=1):

In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1) Out[11]:        0     1 0  True  True 1  True  True 

Note: This can still raise (if the index/columns aren't identically labelled after sorting).

like image 125
Andy Hayden Avatar answered Oct 09 '22 21:10

Andy Hayden


You can also try dropping the index column if it is not needed to compare:

print(df1.reset_index(drop=True) == df2.reset_index(drop=True)) 

I have used this same technique in a unit test like so:

from pandas.util.testing import assert_frame_equal  assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True)) 
like image 27
CoreDump Avatar answered Oct 09 '22 22:10

CoreDump