Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - Get week average for each user

I'm trying to figure out how to take a dataframe representing players in a game, the dataframe has unique users and records of each day the particular user has been active.

I am trying to get the average playtime and average moves for each week in the various users lifetime.

(Week is defined by a user's first record, i.e. if a user's first record is 3rd of January, their 1st week starts then and the 2nd week start the 10th of January).

Example

userid                          date          secondsPlayed   movesMade
++/acsbP2NFC2BvgG1BzySv5jko=    2016-04-28    413.88188       85
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-01    82.67343        15
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-05    236.73809       39
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-10    112.69112       29
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-11    211.42790       44
-----------------------------------CONT----------------------------------
++/8ij1h8378h123123koF3oer1    2016-05-05     200.73809       11
++/8ij1h8378h123123koF3oer1    2016-05-10     51.69112        14
++/8ij1h8378h123123koF3oer1    2016-05-14     65.42790        53

The end result for this would be the following table:

userid                          date        secondsPlayed_w movesMade_w
++/acsbP2NFC2BvgG1BzySv5jko=    2016-04-28    496.55531       100
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-05    236.73809       68    
-----------------------------------CONT----------------------------------
++/8ij1h8378h123123koF3oer1    2016-05-05     252.42921       25    
++/8ij1h8378h123123koF3oer1    2016-05-12     65.42790        53

Failed attempt #1:

So far I've tried doing a lot of different things, but the most useful dataframe I've managed to create was the following:


    df_grouped = df.groupby('userid').apply(lambda x: x.set_index('date').resample('1D').first().fillna(0))
    df_result = df_grouped.groupby(level=0)['secondsPlayed'].apply(lambda x: x.rolling(min_periods=1, window=7).mean()).reset_index(name='secondsPlayed_week')

Which is a very slow and wasteful computation, but nonetheless can be used as a intermediate step.

userid                          date        secondsPlayed_w
++/acsbP2NFC2BvgG1BzySv5jko=    2016-04-28  4.138819e+02
++/acsbP2NFC2BvgG1BzySv5jko=    2016-04-29  2.069409e+02    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-04-30  1.379606e+02    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-01  1.241388e+02    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-02  9.931106e+01    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-03  8.275922e+01    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-04  7.093647e+01    
++/acsbP2NFC2BvgG1BzySv5jko=    2016-05-05  4.563022e+01

Failed attempt #2:


df_result = (df
    .reset_index()
    .set_index("date")
    .groupby(pd.Grouper(freq='W'))).agg({"userid":"first", "secondsPlayed":"sum", "movesUsed":"sum"})
    .reset_index()

Which gave me the following dataframe, which has the fault of not being grouped by userids (the NaN problem is easily resolved).

date        userid                        secondsPlayed_w   movesMade_w
2016-04-10  +1kexX0Yk2Su639WaRKARcwjq5g=    2.581356e+03    320
2016-04-17  +1kexX0Yk2Su639WaRKARcwjq5g=    4.040738e+03    615
2016-04-24   NaN                             0.000000e+00   0
2016-05-01  ++RBPf9KdTK6pTN+lKZHDLCXg10=    1.644130e+05    17453
2016-05-08  ++DndI7do036eqYh9iW7vekAnx0=    3.775905e+05    31997
2016-05-15  ++NjKpr/vyxNCiYcmeFK9qSqD9o=    4.993430e+05    34706
2016-05-22  ++RBPf9KdTK6pTN+lKZHDLCXg10=    3.940408e+05    23779

Immediate thought:

Can this problem be solved by using a groupby that groups by two columns. But I'm not at all sure how to go about that with this particular problem.

like image 562
Thomas Heiberg Avatar asked Oct 27 '25 03:10

Thomas Heiberg


2 Answers

You can create a newid help groupby

df.date=pd.to_datetime(df.date)
df['Newweeknumber']=df.groupby('userid').date.diff().dt.days.cumsum().fillna(0)//7# get the week number by the first date of each id
df.groupby(['userid','Newweeknumber']).agg({"userid":"first", "secondsPlayed":"sum", "movesMade":"sum"})
like image 75
BENY Avatar answered Oct 29 '25 17:10

BENY


Update

Try

df1 = pd.DataFrame(index=pd.date_range('2015-04-24', periods = 50)).assign(value=1)
df2 = pd.DataFrame(index=pd.date_range('2015-04-28', periods = 50)).assign(value=1)

df3 = pd.concat([df1,df2], keys=['A','B'])

df3 = df3.rename_axis(['user','date']).reset_index()

df3.groupby('user').apply(lambda x: x.resample('7D', on='date').sum())

Output:

                 value
user date             
A    2015-04-24      7
     2015-05-01      7
     2015-05-08      7
     2015-05-15      7
     2015-05-22      7
     2015-05-29      7
     2015-06-05      7
     2015-06-12      1
B    2015-04-28      7
     2015-05-05      7
     2015-05-12      7
     2015-05-19      7
     2015-05-26      7
     2015-06-02      7
     2015-06-09      7
     2015-06-16      1
like image 21
Scott Boston Avatar answered Oct 29 '25 17:10

Scott Boston



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!