Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows after maximum value in a grouped Pandas dataframe

I've got a date-ordered dataframe that can be grouped. What I am attempting to do is groupby a variable (Person), determine the maximum (weight) for each group (person), and then drop all rows that come after (date) the maximum.

Here's an example of the data:

df = pd.DataFrame({'Person': 1,1,1,1,1,2,2,2,2,2],'Date': '1/1/2015','2/1/2015','3/1/2015','4/1/2015','5/1/2015','6/1/2011','7/1/2011','8/1/2011','9/1/2011','10/1/2011'], 'MonthNo':[1,2,3,4,5,1,2,3,4,5], 'Weight':[100,110,115,112,108,205,210,211,215,206]})

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
3   4/1/2015        4       1     112
4   5/1/2015        5       1     108
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215
9  10/1/2011        5       2     206

Here's what I want the result to look like:

    Date      MonthNo  Person  Weight
0   1/1/2015        1       1     100
1   2/1/2015        2       1     110
2   3/1/2015        3       1     115
5   6/1/2011        1       2     205
6   7/1/2011        2       2     210
7   8/1/2011        3       2     211
8   9/1/2011        4       2     215

I think its worth noting, there can be disjoint start dates and the maximum may appear at different times.

My idea was to find the maximum for each group, obtain the MonthNo the maximum was in for that group, and then discard any rows with MonthNo greater Max Weight MonthNo. So far I've been able to obtain the max by group, but cannot get past doing a comparison based on that.

Please let me know if I can edit/provide more information, haven't posted many questions here! Thanks for the help, sorry if my formatting/question isn't clear.

like image 683
Chris B Avatar asked Jul 06 '18 16:07

Chris B


2 Answers

Using idxmax with groupby

df.groupby('Person',sort=False).apply(lambda x  : x.reset_index(drop=True).iloc[:x.reset_index(drop=True).Weight.idxmax()+1,:])

Out[131]: 
              Date  MonthNo  Person  Weight
Person                                     
1      0  1/1/2015        1       1     100
       1  2/1/2015        2       1     110
       2  3/1/2015        3       1     115
2      0  6/1/2011        1       2     205
       1  7/1/2011        2       2     210
       2  8/1/2011        3       2     211
       3  9/1/2011        4       2     215
like image 147
BENY Avatar answered Oct 05 '22 22:10

BENY


You can use groupby.transform with idxmax. The first 2 steps may not be necessary depending on how your dataframe is structured.

# convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'])

# sort by Person and Date to make index usable for next step
df = df.sort_values(['Person', 'Date']).reset_index(drop=True)

# filter for index less than idxmax transformed by group
df = df[df.index <= df.groupby('Person')['Weight'].transform('idxmax')]

print(df)

        Date  MonthNo  Person  Weight
0 2015-01-01        1       1     100
1 2015-02-01        2       1     110
2 2015-03-01        3       1     115
5 2011-06-01        1       2     205
6 2011-07-01        2       2     210
7 2011-08-01        3       2     211
8 2011-09-01        4       2     215
like image 30
jpp Avatar answered Oct 05 '22 22:10

jpp