Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mean Std in pandas data frame

Having a pandas data frame as follow:

   a   b
0  1  12
1  1  13
2  1  23
3  2  22
4  2  23
5  2  24
6  3  30
7  3  35
8  3  55

I want to find the mean standard deviation of column b in each group. My following code give me 0 for each group.

stdMeann = lambda x: np.std(np.mean(x))
print(pd.Series(data.groupby('a').b.apply(stdMeann)))
like image 905
Elham Avatar asked Nov 02 '17 21:11

Elham


1 Answers

As noted in the comments you can use .agg to aggregate by multiple statistics:

In [11]: df.groupby("a")["b"].agg([np.mean, np.std])
Out[11]:
   mean        std
a
1    16   6.082763
2    23   1.000000
3    40  13.228757

pandas let's you pass dispatch strings, rather than using the numpy function:

In [12]: df.groupby("a")["b"].agg(["mean", "std"])  # just b
Out[12]:
   mean        std
a
1    16   6.082763
2    23   1.000000
3    40  13.228757

In [13]: df.groupby("a").agg(["mean", "std"])  # all columns
Out[13]:
     b
  mean        std
a
1   16   6.082763
2   23   1.000000
3   40  13.228757

You can also specify what to do on a per-column basis:

In [14]: df.groupby("a").agg({"b": ["mean", "std"]})
Out[14]:
     b
  mean        std
a
1   16   6.082763
2   23   1.000000
3   40  13.228757

Note: the reason you were getting 0s was that np.std of a single number is 0 (it's a little surprising to me that it's not an error, but there we are):

In [21]: np.std(1)
Out[21]: 0.0
like image 97
Andy Hayden Avatar answered Oct 13 '22 00:10

Andy Hayden