Drop rows after maximum value in a grouped Pandas dataframe

Question

I've got a date-ordered dataframe that can be grouped. What I am attempting to do is groupby a variable (Person), determine the maximum (weight) for each group (person), and then drop all rows that come after (date) the maximum.

Here's an example of the data:

df = pd.DataFrame({'Person': 1,1,1,1,1,2,2,2,2,2],'Date': '1/1/2015','2/1/2015','3/1/2015','4/1/2015','5/1/2015','6/1/2011','7/1/2011','8/1/2011','9/1/2011','10/1/2011'], 'MonthNo':[1,2,3,4,5,1,2,3,4,5], 'Weight':[100,110,115,112,108,205,210,211,215,206]})

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
3   4/1/2015        4       1     112
4   5/1/2015        5       1     108
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215
9  10/1/2011        5       2     206

Here's what I want the result to look like:

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215

I think its worth noting, there can be disjoint start dates and the maximum may appear at different times.

My idea was to find the maximum for each group, obtain the MonthNo the maximum was in for that group, and then discard any rows with MonthNo greater Max Weight MonthNo. So far I've been able to obtain the max by group, but cannot get past doing a comparison based on that.

Please let me know if I can edit/provide more information, haven't posted many questions here! Thanks for the help, sorry if my formatting/question isn't clear.

BENY · Accepted Answer

Using idxmax with groupby

df.groupby('Person',sort=False).apply(lambda x  : x.reset_index(drop=True).iloc[:x.reset_index(drop=True).Weight.idxmax()+1,:])

Out[131]: 
              Date  MonthNo  Person  Weight
Person                                     
1      0  1/1/2015        1       1     100
       1  2/1/2015        2       1     110
       2  3/1/2015        3       1     115
2      0  6/1/2011        1       2     205
       1  7/1/2011        2       2     210
       2  8/1/2011        3       2     211
       3  9/1/2011        4       2     215

jpp · Answer

You can use groupby.transform with idxmax. The first 2 steps may not be necessary depending on how your dataframe is structured.

# convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'])

# sort by Person and Date to make index usable for next step
df = df.sort_values(['Person', 'Date']).reset_index(drop=True)

# filter for index less than idxmax transformed by group
df = df[df.index <= df.groupby('Person')['Weight'].transform('idxmax')]

print(df)

        Date  MonthNo  Person  Weight
0 2015-01-01        1       1     100
1 2015-02-01        2       1     110
2 2015-03-01        3       1     115
5 2011-06-01        1       2     205
6 2011-07-01        2       2     210
7 2011-08-01        3       2     211
8 2011-09-01        4       2     215

Drop rows after maximum value in a grouped Pandas dataframe

Tags:

python

pandas

dataframe

pandas-groupby

Chris B

2 Answers

BENY

jpp

Recent Activity

Donate For Us

Drop rows after maximum value in a grouped Pandas dataframe

Tags:

python

pandas

dataframe

pandas-groupby

Chris B

2 Answers

BENY

jpp

Related questions

Recent Activity

Donate For Us