I want to to keep one record that has the largest series for each id. So for each id I need one row. I think I need something like
df_new = df.groupby('id')['series'].nlargest(1)
, but that's definitely wrong.
That's how my dataset looks:
id series s1 s2 s3
1 2 4 9 1
1 8 6 2 2
1 3 9 1 3
2 9 4 1 5
2 2 2 5 5
2 5 1 7 8
3 6 7 2 3
3 2 4 4 1
3 1 3 9 9
This should be the result:
id series s1 s2 s3
1 8 6 2 2
2 9 4 1 5
3 6 7 2 3
IIUC you want to groupby on 'id' column and get the index label where the 'Series' value is the largest using idxmax() and use this to index back in the orig df:
In [91]:
df.loc[df.groupby('id')['series'].idxmax()]
Out[91]:
id series s1 s2 s3
1 1 8 6 2 2
3 2 9 4 1 5
6 3 6 7 2 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With