Inputs:
df['PopEst']
.astype('float')
.groupby(ContinentDict)
.agg(['size','sum','mean','std']))
Outputs:
size sum mean std
Asia 5 2.898666e+09 5.797333e+08 6.790979e+08
Australia 1 2.331602e+07 2.331602e+07 NaN
Europe 6 4.579297e+08 7.632161e+07 3.464767e+07
North America 2 3.528552e+08 1.764276e+08 1.996696e+08
South America 1 2.059153e+08 2.059153e+08 NaN
Some values in column of std turns out to be NaN if the group just have one row, but I think these values are supposed to be 0, why is that?
From the docs: "NA groups in GroupBy are automatically excluded".
Pandas DataFrame agg() Method The agg() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.
agg is an alias for aggregate . Use the alias. A passed user-defined-function will be passed a Series for evaluation. The aggregation is for each column.
The following is a step-by-step guide of what you need to do. Group the dataframe on the column(s) you want. Select the field(s) for which you want to estimate the standard deviation. Apply the pandas std() function directly or pass 'std' to the agg() function.
pd.DataFrame.std
assumes 1 degree of freedom by default, also known as sample standard deviation. This results in NaN
results for groups with one number.
numpy.std
, by contrast, assumes 0 degree of freedom by default, also known as population standard deviation. This gives 0
for groups with one number.
To understand the difference between sample and population, see Bessel's correction.
Therefore, you can specify numpy.std
for your calculation. Note, however, that the output will be different as the calculation is different. Here's a minimal example.
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.randint(0, 9, (5, 2)))
def std(x): return np.std(x)
res = df.groupby(0)[1].agg(['size', 'sum', 'mean', std])
print(res)
size sum mean std
0
0 2 13 6.5 0.5
4 1 3 3.0 0.0
5 1 3 3.0 0.0
6 1 3 3.0 0.0
Alternatively, if you require 1 degree of freedom, you can use fillna
to replace NaN
values with 0
:
res = df.groupby(0)[1].agg(['size', 'sum', 'mean', 'std']).fillna(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With