I have a script that generates a pandas data frame with a varying number of value columns. As an example, this df might be
import pandas as pd
df = pd.DataFrame({
'group': ['A', 'A', 'A', 'B', 'B'],
'group_color' : ['green', 'green', 'green', 'blue', 'blue'],
'val1': [5, 2, 3, 4, 5],
'val2' : [4, 2, 8, 5, 7]
})
group group_color val1 val2
0 A green 5 4
1 A green 2 2
2 A green 3 8
3 B blue 4 5
4 B blue 5 7
My goal is to get the grouped mean for each of the value columns. In this specific case (with 2 value columns), I can use
df.groupby('group').agg({"group_color": "first", "val1": "mean", "val2": "mean"})
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
but that does not work when the data frame in question has more value columns (val3, val4 etc.). Is there a way to dynamically take the mean of "all the other columns" or "all columns containing val in their names"?
More easy like
df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())
Out[63]:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
If your group_color
is always the same within one group, you can do:
df.pivot_table(index=['group','group_color'],aggfunc='mean')
Output:
val1 val2
group group_color
A green 3.333333 4.666667
B blue 4.500000 6.000000
In the other case, you can build the dictionary and pass it to agg
:
agg_dict = {f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]}
df.groupby('group').agg(agg_dict)
Which output:
group_color val1 val2
group
A green 3.333333 4.666667
B blue 4.500000 6.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With