I have a dataframe at Assignment Id level which includes dates of submission, student Id. I want to find number of assignments submitted by a student in past 12 months(excluding latest entry) wrt latest entry. Assignment Id is the unique key. I want the cumulative count to be made basis Assignment Id.
I tried using groupby to perform this step but couldnt find the desired output. I want my answer in python.
what I have
Assmt id student id date of submission
106473754 100357 2/1/2016
102485554 100357 3/1/2016
108474032 100357 4/1/2016
101663805 100357 2/1/2017
307953885 100364 5/1/2017
307252429 100364 7/1/2017
304205214 100364 11/1/2017
304041247 100364 11/1/2017
512459298 100364 2/1/2018
what i want
student id date of submission count_in_12_mon
100357 2/1/2017 3
100364 2/1/2018 4
You may need to find the max
value for each group using transform
, then convert the datetime
to months and compare with all date of submission
, then assign the value back , using agg
s=df.groupby('studentid')['dateofsubmission'].transform('max')
s1=(s.dt.year*12+s.dt.month-df.dateofsubmission.dt.year*12-df.dateofsubmission.dt.month)
df['New']=((s1>0)&(s1<=12))
yourdf=df.groupby('studentid').agg({'New':'sum','dateofsubmission':'last'}).reset_index()
yourdf
Out[851]:
studentid dateofsubmission New
0 100357 2017-02-01 3.0
1 100364 2018-02-01 4.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With