It is clear when doing
data.groupby(['A','B']).mean()
We get something multiindex by level 'A' and 'B' and one column with the mean of each group
how could I have the count(), std() simultaneously ?
so result looks like in a dataframe
A B mean count std
To apply aggregations to multiple columns, just add additional key:value pairs to the dictionary. Applying multiple aggregation functions to a single column will result in a multiindex. Working with multi-indexed columns is a pain and I'd recommend flattening this after aggregating by renaming the new columns.
The Group By statement is used to group together any rows of a column with the same value stored in them, based on a function specified in the statement. Generally, these functions are one of the aggregate functions such as MAX() and SUM(). This statement is used with the SELECT command in SQL.
The following should work:
data.groupby(['A','B']).agg([pd.Series.mean, pd.Series.std, pd.Series.count])
basically call agg
and passing a list of functions will generate multiple columns with those functions applied.
Example:
In [12]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':[0,0,1,1,2]})
df.groupby(['b']).agg([pd.Series.mean, pd.Series.std, pd.Series.count])
Out[12]:
a
mean std count
b
0 -0.769198 0.158049 2
1 0.247708 0.743606 2
2 -0.312705 NaN 1
You can also pass the string of the method names, the common ones work, some of the more obscure ones don't I can't remember which but in this case they work fine, thanks to @ajcr for the suggestion:
In [16]:
df = pd.DataFrame({'a':np.random.randn(5), 'b':[0,0,1,1,2]})
df.groupby(['b']).agg(['mean', 'std', 'count'])
Out[16]:
a
mean std count
b
0 -1.037301 0.790498 2
1 -0.495549 0.748858 2
2 -0.644818 NaN 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With