Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas, apply multiple functions of multiple columns to groupby object

I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame.

I know how to do it in seperate steps:

by_user = lasts.groupby('user')
elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400)
running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400)
user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))

Which results in user_df being: user_df

However I suspect that there is a better way, like:

by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400, 
             'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400})

However, this doesn't work, because AFAIK agg() works on pandas.Series.

I did find this question and answer, but the solutions look rather ugly to me, and considering that the answer is nearly four years old, there might be a better way by now.

like image 553
johnbaltis Avatar asked Nov 10 '16 16:11

johnbaltis


1 Answers

Another solid variation of the solution is to do what @MaxU did with this solution to a similar question and wrap the individual functions in a Pandas series, thus only requiring a reset_index() to return a dataframe.

First, define the functions for transformations:

def ed(group):
    return group.elapsed_time * group.num_cores).sum() / 86400

def rd(group):
    return group.running_time * group.num_cores).sum() / 86400

Wrap them up in a Series using get_stats:

def get_stats(group):
    return pd.Series({'elapsed_days': ed(group),
                      'running_days':rd(group)})

Finally:

lasts.groupby('user').apply(get_stats).reset_index()
like image 70
zthomas.nc Avatar answered Oct 11 '22 00:10

zthomas.nc