I have a dataframe that looks like this:
Reference | ID | Length
ref101 |123456 | 10
ref101 |123789 | 5
ref202 |654321 | 20
ref202 |653212 | 40
I'm trying to determine which row for each row in the Reference
column has the greatest length (based on the value in the Length
column). For example, ref101
with ID
123456
is greater in length than ref101
with ID 123789.
I've been playing around with .groupby()
, but am getting nowhere. Is there a way of performing this sort of operation in Pandas?
If it's the whole row you want, then use groupby
+ idxmax
:
df.loc[df.groupby('Reference').Length.idxmax()]
Reference ID Length
0 ref101 123456 10
3 ref202 653212 40
If you want just the length, then groupby
+ max
will suffice:
df.groupby('Reference').Length.max()
Reference
ref101 10
ref202 40
Name: Length, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With