Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas- return Month containing Max value for each year

I have a dataframe like:

Year Month Value
2017  1     100
2017  2      1
2017  4      2
2018  3      88
2018  4      8
2019  5      87
2019  6      1

I'd the dataframe to return the Month and Value for each year where the value is the maximum:

year  month  value
2017    1      100
2018    3      88
2019    5      87

I've attempted something like df=df.groupby(["Year","Month"])['Value']).max() however, it returns the full data set because each Year / Month pair is unique (i believe).

like image 991
machump Avatar asked Dec 24 '22 03:12

machump


2 Answers

You can get the index where the top Value occurs with .groupby(...).idxmax() and use that to index into the original dataframe:

In [28]: df.loc[df.groupby("Year")["Value"].idxmax()]
Out[28]:
   Year  Month  Value
0  2017      1    100
3  2018      3     88
5  2019      5     87
like image 82
Randy Avatar answered Feb 12 '23 04:02

Randy


Here is a solution that also handles duplicate possibility:

m = df.groupby('Year')['Value'].transform('max') == df['Value']
dfmax = df.loc[m]

Full example:

import pandas as pd

data = '''\
Year Month Value
2017  1     100
2017  2      1
2017  4      2
2018  3      88
2018  4      88
2019  5      87
2019  6      1'''

fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep='\s+')
m = df.groupby('Year')['Value'].transform('max') == df['Value']
print(df[m])

   Year  Month  Value
0  2017      1    100
3  2018      3     88
4  2018      4     88
5  2019      5     87
like image 43
Anton vBR Avatar answered Feb 12 '23 03:02

Anton vBR