I'm trying to do some aggregations on a pandas data frame. Here is a sample code:
import pandas as pd df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"], "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]}) df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}}) Out[1]: Amount Sum Count User user1 18.0 2 user2 20.5 3 user3 10.5 1
Which generates the following warning:
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
How can I avoid this?
The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename method. See this deprecation note in the documentation for more detail.
apply
and return a Series to rename columnsUse the groupby apply
method to perform an aggregation that
To do this:
apply
Create fake data
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"], "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9], 'Score': [9, 1, 8, 7, 7, 6, 9]})
create custom function that returns a Series
The variable x
inside of my_agg
is a DataFrame
def my_agg(x): names = { 'Amount mean': x['Amount'].mean(), 'Amount std': x['Amount'].std(), 'Amount range': x['Amount'].max() - x['Amount'].min(), 'Score Max': x['Score'].max(), 'Score Sum': x['Score'].sum(), 'Amount Score Sum': (x['Amount'] * x['Score']).sum()} return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean', 'Score Sum', 'Score Max', 'Amount Score Sum'])
Pass this custom function to the groupby apply
method
df.groupby('User').apply(my_agg)
The big downside is that this function will be much slower than agg
for the cythonized aggregations
agg
methodUsing a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.
df.groupby('User')['Amount'].agg(['sum', 'count'])
Output
sum count User user1 18.0 2 user2 20.5 3 user3 10.5 1
It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other
.
df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"], "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0], 'Other': [1,2,3,4,5,6]}) df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})
Output
Amount Other sum count max std User user1 18.0 2 6 3.535534 user2 20.5 3 5 1.527525 user3 10.5 1 4 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With