Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Comparing 2 dataframes without iterating

Considering I have 2 dataframes as shown below (DF1 and DF2), I need to compare DF2 with DF1 such that I can identify all the Matching, Different, Missing values for all the columns in DF2 that match columns in DF1 (Col1, Col2 & Col3 in this case) for rows with same EID value (A, B, C & D). I do not wish to iterate on each row of a dataframe as it can be time-consuming. Note: There can around 70 - 100 columns. This is just a sample dataframe I am using.

DF1

    EID Col1 Col2 Col3 Col4
0   A   a1   b1   c1   d1
1   B   a2   b2   c2   d2
2   C   None b3   c3   d3
3   D   a4   b4   c4   d4
4   G   a5   b5   c5   d5

DF2

    EID Col1 Col2 Col3
0   A   a1   b1   c1
1   B   a2   b2   c9
2   C   a3   b3   c3
3   D   a4   b4   None

Expected output dataframe

    EID Col1 Col2 Col3 New_Col
0   A   a1   b1   c1   Match
1   B   a2   b2   c2   Different
2   C   None b3   c3   Missing in DF1
3   D   a4   b4   c4   Missing in DF2
like image 562
Shashank Shekher Avatar asked Jun 28 '26 05:06

Shashank Shekher


1 Answers

Firstly, you will need to filter df1 based on df2.

new_df = df1.loc[df1['EID'].isin(df2['EID']), df2.columns]

  EID  Col1 Col2 Col3
0   A    a1   b1   c1
1   B    a2   b2   c2
2   C  None   b3   c3
3   D    a4   b4   c4

Next, since you have a big dataframe to compare, you can change both the new_df and df2 to numpy arrays.

array1 = new_df.to_numpy()
array2 = df2.to_numpy()

Now you can compare it row-wise using np.where

new_df['New Col'] = np.where((array1 == array2).all(axis=1),'Match', 'Different')

  EID  Col1 Col2 Col3    New Col
0   A    a1   b1   c1      Match
1   B    a2   b2   c2  Different
2   C  None   b3   c3  Different
3   D    a4   b4   c4  Different

Finally, to convert the row with None value, you can use df.loc and df.isnull

new_df.loc[new_df.isnull().any(axis=1), ['New Col']] = 'Missing in DF1'
new_df.loc[df2.isnull().any(axis=1), ['New Col']] = 'Missing in DF2'

  EID  Col1 Col2 Col3         New Col
0   A    a1   b1   c1           Match
1   B    a2   b2   c2       Different
2   C  None   b3   c3  Missing in DF1
3   D    a4   b4   c4  Missing in DF2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!