I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): ...
uat = uat[['Customer Number','Product']] prod = prod[['Customer Number','Product']] print uat['Customer Number'] == prod['Customer Number'] print uat['Product'] == prod['Product'] print uat == prod The first two match exactly: 74357 True 74356 True Name: Customer Number, dtype: bool 74357 True 74356 True Name: Product, dtype: bool
For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?
Thanks
If you try to compare DataFrames with different indexes using the equality comparison operator == , you will raise the ValueError: Can only compare identically-labeled DataFrame objects. You can solve this error by using equals instead of ==. For example, df1. equals(df2) , which ignores the indexes.
Can only compare identically-labeled series objects: It is Value Error, occurred when we compare 2 different DataFrames (Pandas 2-D Data Structure). If we compare DataFrames which are having different labels or indexes then this error can be thrown.
Pandas DataFrame: equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]]) In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0]) In [3]: df1 == df2 Exception: Can only compare identically-labeled DataFrame objects
One solution is to sort the index first (Note: some functions require sorted indexes):
In [4]: df2.sort_index(inplace=True) In [5]: df1 == df2 Out[5]: 0 1 0 True True 1 True True
Note: ==
is also sensitive to the order of columns, so you may have to use sort_index(axis=1)
:
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1) Out[11]: 0 1 0 True True 1 True True
Note: This can still raise (if the index/columns aren't identically labelled after sorting).
You can also try dropping the index column if it is not needed to compare:
print(df1.reset_index(drop=True) == df2.reset_index(drop=True))
I have used this same technique in a unit test like so:
from pandas.util.testing import assert_frame_equal assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With