Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: remove multiple rows based on condition

Below is a subset of a pandas dataframe I have and I am trying to remove multiple rows based on some conditions.

  code1 code2 grp1 grp2  dist_km
0  M001  M002  AAA  AAA      112
1  M001  M003  AAA  IHH      275
2  M002  M005  AAA  XXY      150
3  M002  M004  AAA  AAA       65
4  M003  M443  IHH  GRR       50
5  M003  M667  IHH  IHH      647
6  M003  M664  IHH  FFG      336

So I would only like to keep the rows where grp1 is the same as grp2 for each code1 but only where dist_km is the smallest value for that specific code1.

For the example above, only these rows will remain:

  code1 code2 grp1 grp2  dist_km
0  M001  M002  AAA  AAA      112
3  M002  M004  AAA  AAA       65

What would be the easiest way to do this?

like image 205
Funkeh-Monkeh Avatar asked Dec 17 '22 18:12

Funkeh-Monkeh


1 Answers

No need groupby using sort_values with drop_duplicates

df.sort_values('dist_km').drop_duplicates('code1').query('grp1==grp2')
  code1 code2 grp1 grp2  dist_km
3  M002  M004  AAA  AAA       65
0  M001  M002  AAA  AAA      112
like image 182
BENY Avatar answered Feb 15 '23 09:02

BENY