I’m trying to create multiple aggregations of the same field. I’m working in pandas, in python3.7. The syntax seems pretty straightforward based on the documentation:
https://pandas-docs.github.io/pandas-docs-travis/user_guide/groupby.html#named-aggregation
I do not see why I’m getting the error below. Could someone please point out the issue and tell me how to fix it?
code:
qt_dy.groupby('date').agg(std_qty=('qty','std'),mean_qty=('qty','mean'),)
error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-6bb3aabf313f> in <module>
5
6 qt_dy.groupby('date')\
----> 7 .agg(std_qty=('qty','std'),mean_qty=('qty','mean'))
TypeError: aggregate() missing 1 required positional argument: 'arg'
Looks like you're trying to use agg
with Named aggregations—this is a supported feature from v0.25 and above ONLY.
For older versions, you will need to use the list of tuples format:
qt_dy.groupby('date')['qty'].agg([('std_qty','std'), ('mean_qty','mean')])
Or, to aggregate multiple columns, a dictionary:
qt_dy.groupby('date').agg({'qty': [('std_qty','std'), ('mean_qty','mean')]})
For more information, take a look at my answer here.
I just wanted to add to the above answer.
If you are getting this error because your pandas version is older than 0.25 print(pd.__version__)
and if you want to aggregate across multple columns avoiding the pivot structure that pandas generate here is the code.
Let us first create a sample Pandas dataframe
import pandas as pd
df = pd.DataFrame({'key1' : ['a','a','a','b','a'],
'key2' : ['c','c','d','d','e'],
'value1' : [1,2,2,3,3],
'value2' : [9,8,7,6,5]})
df.head(5)
Here is how the table we created looks like:
|----------------|-------------|------------|------------|
| key1 | key2 | value1 | value2 |
|----------------|-------------|------------|------------|
| a | c | 1 | 9 |
| a | c | 2 | 8 |
| a | d | 2 | 7 |
| b | d | 3 | 6 |
| a | e | 3 | 5 |
|----------------|-------------|------------|------------|
Now to do the aggregation for both value1
and value2
you will run this code:
df_agg = df.groupby(['key1','key2'],as_index=False).agg({'value1':['mean','count'],'value2':'sum'})
df_agg.columns = ['_'.join(col).strip() for col in df_agg.columns.values]
df_agg.head(5)
The resulting table will look like this:
|----------------|-------------|--------------------|-------------------|---------------------|
| key1 | key2 | value1_mean | value1_count | value2_sum |
|----------------|-------------|--------------------|-------------------|---------------------|
| a | c | 1.5 | 2 | 17 |
| a | d | 2.0 | 1 | 7 |
| a | e | 3.0 | 1 | 5 |
| b | d | 3.0 | 1 | 6 |
|----------------|-------------|--------------------|-------------------|---------------------|
If you want the column names to be something else then just rename it like below:
df_agg.rename(columns={"value1_mean" : "mean of value1",
"value1_count" : "count of value1",
"value2_sum" : "sum of value2"
}, inplace=True)
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With