Combine complex aggregation function when using pandas groupby

Question

Think of the following table

np.random.seed(42)
ix = pd.date_range('2017-01-01', '2017-01-15', freq='60s')
df = pd.DataFrame(
    {
        'val': np.random.random(size=ix.shape[0]),
        'active': np.random.choice([0,1], size=ix.shape[0])
    },
    index=ix
)
df.sample(10)

yielding:

                    active   val
2017-01-02 06:05:00 1   0.774654
2017-01-04 08:15:00 1   0.934796
2017-01-13 01:02:00 0   0.792351...

My objective is to compute:

sum per day
sum of actives per day

Sum per day This one is straightforwards:

gb = df.groupby(pd.to_datetime(df.index.date))
overall_sum_per_day = gb['val'].sum().rename('overall')

Sum per active day This is a little trickier (see this).

active_sum_per_day = gb.agg(lambda x: x[x.active==1]['val'].sum())['val'].rename('active')

My question How can I combine the two. Using concat:

pd.concat([overall_sum_per_day, active_sum_per_day], axis=1)

I can achieve my objective. But I fail to do achieve it in one go and apply the two aggregations at once. Is it possible? See this comment.

jezrael · Accepted Answer

You can use GroupBy.apply:

b = gb.apply(lambda x: pd.Series([x['val'].sum(), x.loc[x.active==1, 'val'].sum()], 
                                  index=['overall', 'active']))
print (b)
               overall      active
2017-01-01  715.997165  366.856234
2017-01-02  720.101832  355.100828
2017-01-03  711.247370  335.231948
2017-01-04  713.688122  338.088299
2017-01-05  716.127970  342.889442
2017-01-06  697.319129  338.741027
2017-01-07  708.121948  361.086977
2017-01-08  731.032093  370.697884
2017-01-09  718.386679  342.162494
2017-01-10  709.706473  349.657514
2017-01-11  720.477342  368.407343
2017-01-12  738.286682  378.618305
2017-01-13  735.805583  372.039108
2017-01-14  727.502271  345.612816
2017-01-15    0.613559    0.613559

Another solution:

b = gb.agg(lambda x: [x['val'].sum(), x.loc[x.active==1, 'val'].sum()])
       .rename(columns={'val':'overall'})
print (b)
                active     overall
2017-01-01  715.997165  366.856234
2017-01-02  720.101832  355.100828
2017-01-03  711.247370  335.231948
2017-01-04  713.688122  338.088299
2017-01-05  716.127970  342.889442
2017-01-06  697.319129  338.741027
2017-01-07  708.121948  361.086977
2017-01-08  731.032093  370.697884
2017-01-09  718.386679  342.162494
2017-01-10  709.706473  349.657514
2017-01-11  720.477342  368.407343
2017-01-12  738.286682  378.618305
2017-01-13  735.805583  372.039108
2017-01-14  727.502271  345.612816
2017-01-15    0.613559    0.613559

Combine complex aggregation function when using pandas groupby

Tags:

python

pandas

Dror

1 Answers

jezrael

Recent Activity

Donate For Us

Combine complex aggregation function when using pandas groupby

Tags:

python

pandas

Dror

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us