Pandas: remove multiple rows based on condition

Question

Below is a subset of a pandas dataframe I have and I am trying to remove multiple rows based on some conditions.

  code1 code2 grp1 grp2  dist_km
0  M001  M002  AAA  AAA      112
1  M001  M003  AAA  IHH      275
2  M002  M005  AAA  XXY      150
3  M002  M004  AAA  AAA       65
4  M003  M443  IHH  GRR       50
5  M003  M667  IHH  IHH      647
6  M003  M664  IHH  FFG      336

So I would only like to keep the rows where grp1 is the same as grp2 for each code1 but only where dist_km is the smallest value for that specific code1.

For the example above, only these rows will remain:

  code1 code2 grp1 grp2  dist_km
0  M001  M002  AAA  AAA      112
3  M002  M004  AAA  AAA       65

What would be the easiest way to do this?

BENY · Accepted Answer

No need groupby using sort_values with drop_duplicates

df.sort_values('dist_km').drop_duplicates('code1').query('grp1==grp2')
  code1 code2 grp1 grp2  dist_km
3  M002  M004  AAA  AAA       65
0  M001  M002  AAA  AAA      112

Pandas: remove multiple rows based on condition

Tags:

python

pandas

pandas-groupby

Funkeh-Monkeh

1 Answers

BENY

Recent Activity

Donate For Us

Pandas: remove multiple rows based on condition

Tags:

python

pandas

pandas-groupby

Funkeh-Monkeh

1 Answers

BENY

Related questions

Recent Activity

Donate For Us