Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two or more rows in a Pandas dataframe

Tags:

python

pandas

I have a dataframe that looks like this:

Reference |   ID  | Length
ref101    |123456 | 10
ref101    |123789 | 5
ref202    |654321 | 20
ref202    |653212 | 40

I'm trying to determine which row for each row in the Reference column has the greatest length (based on the value in the Length column). For example, ref101 with ID 123456 is greater in length than ref101 with ID 123789.

I've been playing around with .groupby(), but am getting nowhere. Is there a way of performing this sort of operation in Pandas?

like image 296
DanielH Avatar asked Jan 28 '23 11:01

DanielH


1 Answers

If it's the whole row you want, then use groupby + idxmax:

df.loc[df.groupby('Reference').Length.idxmax()]

  Reference      ID  Length
0    ref101  123456      10
3    ref202  653212      40

If you want just the length, then groupby + max will suffice:

df.groupby('Reference').Length.max()

Reference
ref101    10
ref202    40
Name: Length, dtype: int64
like image 179
cs95 Avatar answered Jan 31 '23 01:01

cs95