Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perform groupby to find cumulative count on assignment Id between a range of dates

Tags:

python

pandas

I have a dataframe at Assignment Id level which includes dates of submission, student Id. I want to find number of assignments submitted by a student in past 12 months(excluding latest entry) wrt latest entry. Assignment Id is the unique key. I want the cumulative count to be made basis Assignment Id.

I tried using groupby to perform this step but couldnt find the desired output. I want my answer in python.

what I have

Assmt id    student id  date of submission
106473754   100357          2/1/2016
102485554   100357          3/1/2016
108474032   100357          4/1/2016
101663805   100357          2/1/2017
307953885   100364          5/1/2017
307252429   100364          7/1/2017
304205214   100364          11/1/2017
304041247   100364          11/1/2017
512459298   100364          2/1/2018

what i want

student id  date of submission  count_in_12_mon
100357            2/1/2017                       3
100364            2/1/2018                       4
like image 889
Curious Avatar asked Nov 06 '22 17:11

Curious


1 Answers

You may need to find the max value for each group using transform , then convert the datetime to months and compare with all date of submission, then assign the value back , using agg

s=df.groupby('studentid')['dateofsubmission'].transform('max')
s1=(s.dt.year*12+s.dt.month-df.dateofsubmission.dt.year*12-df.dateofsubmission.dt.month)
df['New']=((s1>0)&(s1<=12))
yourdf=df.groupby('studentid').agg({'New':'sum','dateofsubmission':'last'}).reset_index()
yourdf
Out[851]: 
   studentid dateofsubmission  New
0     100357       2017-02-01  3.0
1     100364       2018-02-01  4.0
like image 169
BENY Avatar answered Nov 14 '22 21:11

BENY