I have a DataFrame df, composed of (age, height). I want to see how the mean of height changes with age, so I group df by age and try to form a new DataFrame new_df, composed of (age, mean_height), code goes below:
groups = df.groupby('age')
new_df = groups.agg({'height' : np.mean,
'age' : # HOW to add age?})
but I don't know how to append age to new_df, hope anyone could give me some advice.
Age is the index of the aggregated dataframe:
In [95]: df = DataFrame({'age':[10,10,20,20,20], 'height':[140,150,145, 190,200]})
In [96]: df
Out[96]:
age height
0 10 140
1 10 150
2 20 145
3 20 190
4 20 200
In [97]: groups = df.groupby('age')
In [98]: groups.agg({'height':np.mean})
Out[98]:
height
age
10 145.000000
20 178.333333
And df.groupby('age').mean() would achieve the same result. If you want it as a column and not an index, add a call to reset_index().
As an alternative, you can call the groupby with as_index=False:
groups = df.groupby('age', as_index=False)
groups.agg({'heigt': np.mean})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With