Pandas aggregate with dynamic column names

Question

I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be

import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5], 
'val2' : [4, 2, 8, 5, 7]
})

  group group_color  val1  val2
0     A       green     5     4
1     A       green     2     2
2     A       green     3     8
3     B        blue     4     5
4     B        blue     5     7

My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use

df.groupby('group').agg({"group_color": "first", "val1": "mean", "val2": "mean"})

      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

but that does not work when the data frame in question has more value columns (val3, val4 etc.). Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?

BENY · Accepted Answer

More easy like

df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]: 
      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Quang Hoang · Answer

If your group_color is always the same within one group, you can do:

df.pivot_table(index=['group','group_color'],aggfunc='mean')

Output:

                       val1      val2
group group_color                    
A     green        3.333333  4.666667
B     blue         4.500000  6.000000

In the other case, you can build the dictionary and pass it to agg:

agg_dict = {f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]}
df.groupby('group').agg(agg_dict)

Which output:

      group_color      val1      val2
group                                
A           green  3.333333  4.666667
B            blue  4.500000  6.000000

Pandas aggregate with dynamic column names

Tags:

python

pandas

aggregate

pandas-groupby

MartijnVanAttekum

2 Answers

BENY

Quang Hoang

Recent Activity

Donate For Us

Pandas aggregate with dynamic column names

Tags:

python

pandas

aggregate

pandas-groupby

MartijnVanAttekum

2 Answers

BENY

Quang Hoang

Related questions

Recent Activity

Donate For Us