Comparing two or more rows in a Pandas dataframe

Question

I have a dataframe that looks like this:

Reference |   ID  | Length
ref101    |123456 | 10
ref101    |123789 | 5
ref202    |654321 | 20
ref202    |653212 | 40

I'm trying to determine which row for each row in the Reference column has the greatest length (based on the value in the Length column). For example, ref101 with ID 123456 is greater in length than ref101 with ID 123789.

I've been playing around with .groupby(), but am getting nowhere. Is there a way of performing this sort of operation in Pandas?

cs95 · Accepted Answer

If it's the whole row you want, then use groupby + idxmax:

df.loc[df.groupby('Reference').Length.idxmax()]

  Reference      ID  Length
0    ref101  123456      10
3    ref202  653212      40

If you want just the length, then groupby + max will suffice:

df.groupby('Reference').Length.max()

Reference
ref101    10
ref202    40
Name: Length, dtype: int64

Comparing two or more rows in a Pandas dataframe

Tags:

python

pandas

DanielH

1 Answers

cs95

Recent Activity

Donate For Us

Comparing two or more rows in a Pandas dataframe

Tags:

python

pandas

DanielH

1 Answers

cs95

Related questions

Recent Activity

Donate For Us