I have a dataframe that looks like this:
Reference | ID | Length
ref101 |123456 | 10
ref101 |123789 | 5
ref202 |654321 | 20
ref202 |653212 | 40
I'm trying to determine which row for each row in the Reference column has the greatest length (based on the value in the Length column). For example, ref101 with ID 123456 is greater in length than ref101 with ID 123789.
I've been playing around with .groupby(), but am getting nowhere. Is there a way of performing this sort of operation in Pandas?
If it's the whole row you want, then use groupby + idxmax:
df.loc[df.groupby('Reference').Length.idxmax()]
Reference ID Length
0 ref101 123456 10
3 ref202 653212 40
If you want just the length, then groupby + max will suffice:
df.groupby('Reference').Length.max()
Reference
ref101 10
ref202 40
Name: Length, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With