I want to apply multiple functions of multiple columns to a groupby object which results in a new pandas.DataFrame
.
I know how to do it in seperate steps:
by_user = lasts.groupby('user')
elapsed_days = by_user.apply(lambda x: (x.elapsed_time * x.num_cores).sum() / 86400)
running_days = by_user.apply(lambda x: (x.running_time * x.num_cores).sum() / 86400)
user_df = elapsed_days.to_frame('elapsed_days').join(running_days.to_frame('running_days'))
Which results in user_df
being:
However I suspect that there is a better way, like:
by_user.agg({'elapsed_days': lambda x: (x.elapsed_time * x.num_cores).sum() / 86400,
'running_days': lambda x: (x.running_time * x.num_cores).sum() / 86400})
However, this doesn't work, because AFAIK agg()
works on pandas.Series
.
I did find this question and answer, but the solutions look rather ugly to me, and considering that the answer is nearly four years old, there might be a better way by now.
Another solid variation of the solution is to do what @MaxU did with this solution to a similar question and wrap the individual functions in a Pandas series, thus only requiring a reset_index()
to return a dataframe.
First, define the functions for transformations:
def ed(group):
return group.elapsed_time * group.num_cores).sum() / 86400
def rd(group):
return group.running_time * group.num_cores).sum() / 86400
Wrap them up in a Series using get_stats
:
def get_stats(group):
return pd.Series({'elapsed_days': ed(group),
'running_days':rd(group)})
Finally:
lasts.groupby('user').apply(get_stats).reset_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With