Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping & aggregating on level 1 index & assigning different aggregation functions using pandas

I have a dataframe df:

                2019            2020            2021        2022
A       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
B       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
C       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90
D       1       10              15              15          31
        2       5               4               7           9
        3       0.3             0.4             0.4         0.7
        4       500             600             70          90

I am trying to group by the level 1 index, 1, 2, 3, 4 and assign different aggregation functions for those 1, 2, 3, 4 indexes so that 1 is aggregated by sum, 2 by mean, and so on. So that the end result would look like this:

            2019            2020            2021        2022
1           40              ...             ...         # sum
2           5               ...             ...         # mean
3           0.3             ...             ...         # mean
4           2000            ...             ...         # sum

I tried:

df.groupby(level = 1).agg({'1':'sum', '2':'mean', '3':'sum', '4':'mean'})

But I get that none of 1, 2, 3, 4 are in columns which they are not, so I am not sure how should I proceed with this problem.

like image 656
Jonas Palačionis Avatar asked Jan 03 '22 08:01

Jonas Palačionis


People also ask

What is difference between group and grouping?

What is the difference between a Group and a Grouping? In a nutshell, groupings are clusters of groups. One metaphor that has worked well in training is this: At the Olympics, there are many sports—gymnastics, swimming, track—and many countries.


2 Answers

You could use apply with a custom function as follows:

import numpy as np

aggs = {1: np.sum, 2: np.mean, 3: np.mean, 4: np.sum}
def f(x):
    func = aggs.get(x.name, np.sum)
    return func(x)
     
df.groupby(level=1).apply(f)

The above code uses sum by default so 1 and 4 could be removed from aggs without any different results. In this way, only groups that should be handled differently from the rest need to be specified.

Result:

      2019    2020   2021    2022               
1     40.0    60.0   60.0   124.0
2      5.0     4.0    7.0     9.0
3      0.3     0.4    0.4     0.7
4   2000.0  2400.0  280.0   360.0
like image 62
Shaido Avatar answered Oct 23 '22 16:10

Shaido


Just in case you were after avoiding for loops. Slice and group by index and agg conditionally.

df1 = (
        df.groupby([df.index.get_level_values(level=1)]).agg(
            lambda x: x.sum() if x.index.get_level_values(level=1).isin([1,4]).any() else x.mean())
        
         
      )
df1



    2019    2020   2021   2022
1    40.0    60.0   60.0  124.0
2     5.0     4.0    7.0    9.0
3     0.3     0.4    0.4    0.7
4  2000.0  2400.0  280.0  360.0
like image 34
wwnde Avatar answered Oct 23 '22 16:10

wwnde