I have a Pandas DataFrame as below:
a b c d 0 Apple 3 5 7 1 Banana 4 4 8 2 Cherry 7 1 3 3 Apple 3 4 7
I would like to group the rows by column 'a' while replacing values in column 'c' by the mean of values in grouped rows and add another column with std deviation of the values in column 'c' whose mean has been calculated. The values in column 'b' or 'd' are constant for all rows being grouped. So, the desired output would be:
a b c d e 0 Apple 3 4.5 7 0.707107 1 Banana 4 4 8 0 2 Cherry 7 1 3 0
What is the best way to achieve this?
DataFrame. mean() method gets the mean value of a particular column from pandas DataFrame, you can use the df["Fee"]. mean() function for a specific column only.
In pandas, the std() function is used to find the standard Deviation of the series. The mean can be simply defined as the average of numbers. In pandas, the mean() function is used to find the mean of the series.
You could use a groupby-agg
operation:
In [38]: result = df.groupby(['a'], as_index=False).agg( {'c':['mean','std'],'b':'first', 'd':'first'})
and then rename and reorder the columns:
In [39]: result.columns = ['a','c','e','b','d'] In [40]: result.reindex(columns=sorted(result.columns)) Out[40]: a b c d e 0 Apple 3 4.5 7 0.707107 1 Banana 4 4.0 8 NaN 2 Cherry 7 1.0 3 NaN
Pandas computes the sample std by default. To compute the population std:
def pop_std(x): return x.std(ddof=0) result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'}) result.columns = ['a','c','e','b','d'] result.reindex(columns=sorted(result.columns))
yields
a b c d e 0 Apple 3 4.5 7 0.5 1 Banana 4 4.0 8 0.0 2 Cherry 7 1.0 3 0.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With