I have two Pandas DataFrames that I would like to compare. For example
a b c
A na na na
B na 1 1
C na 1 na
and
a b c
A 1 na 1
B na na na
C na 1 na
D na 1 na
I want to find the index-column coordinates for any values that are shared, in this case
b
C 1
Is this possible?
DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
If you pass the keys
parameter to concat
, the columns of the resulting dataframe will be comprised of a multi-index which keeps track of the original dataframes:
In [1]: c=pd.concat([df,df2],axis=1,keys=['df1','df2'])
c
Out[1]:
df1 df2
a b c a b c
A na na na 1 na 1
B na 1 1 na na na
C na 1 na na 1 na
D NaN NaN NaN na 1 na
Since the underlying arrays now have the same shape, you can now use ==
to broadcast your comparison and use this as a mask to return all matching values:
In [171]: m=c.df1[c.df1==c.df2];m
Out[171]:
a b c
A NaN NaN NaN
B NaN NaN NaN
C NaN 1 NaN
D NaN NaN NaN
If your 'na' value are actually zeros, you could use a sparse matrix to reduce this to the coordinates of the matching values (you'll lose your index and column names though):
import scipy.sparse as sp
print(sp.coo_matrix(m.where(m.notnull(),0)))
(2, 1) 1.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With