Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two dataframes and remove rows from a df based on a matching column value

I have two pandas df which look like this:

df1:

pid Name score age
100  Ram     3  36
101 Tony     2  40
101 Jack     4  56
200 Jill     6  30

df2
pid Name score age
100  Ram     3  36
101 Tony     2  40
101 John     4  51
101 Jack     9  32
200 Jill     6  30

Both df's are indexed with 'pid'. I would like to compare df1 & df2 based on the column 'score'. i.e, I need to keep only those rows in df2 that are matching with df1 on index and value of score.

My expected result should be

new df2:
pid Name index age
100  Ram     3  36
101 Tony     2  40
101 John     4  51
200 Jill     6  30 

Any help on this regard is highly appreciated.

like image 273
Jithesh Erancheri Avatar asked Jun 14 '26 00:06

Jithesh Erancheri


1 Answers

Use merge by columns pid and score, but first create columns from index by reset_index, last create pid index again and for same columns of new DataFrame add reindex by df2.columns:

df = (pd.merge(df1.reset_index(), 
               df2.reset_index(), on=['score', 'pid'], how='left', suffixes=['_',''])
        .set_index('pid')
        .reindex(columns=df2.columns))

print (df)
     Name  score  age
pid                  
100   Ram      3   36
101  Tony      2   40
101  John      4   51
200  Jill      6   30

Inputs:

print (df1)
     Name  score  age
pid                  
100   Ram      3   36
101  Tony      2   40
101  Jack      4   56
200  Jill      6   30

print (df2)
     Name  score  age
pid                  
100   Ram      3   36
101  Tony      2   40
101  John      4   51
101  Jack      9   32
200  Jill      6   30
like image 181
jezrael Avatar answered Jun 16 '26 03:06

jezrael