Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Tags:

I'm trying to do some aggregations on a pandas data frame. Here is a sample code:

import pandas as pd  df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],                   "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0]})  df.groupby(["User"]).agg({"Amount": {"Sum": "sum", "Count": "count"}})  Out[1]:        Amount                Sum Count User               user1   18.0     2 user2   20.5     3 user3   10.5     1

Which generates the following warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)

How can I avoid this?

269

asked Jun 19 '17 16:06

Victor Mayrink

1 Answers

Use groupby `apply` and return a Series to rename columns

Use the groupby apply method to perform an aggregation that

Renames the columns
Allows for spaces in the names
Allows you to order the returned columns in any way you choose
Allows for interactions between columns
Returns a single level index and NOT a MultiIndex

To do this:

create a custom function that you pass to apply
This custom function is passed each group as a DataFrame
Return a Series
The index of the Series will be the new columns

Create fake data

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1", "user3"],                   "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0, 9],                   'Score': [9, 1, 8, 7, 7, 6, 9]})

enter image description here

create custom function that returns a Series
The variable x inside of my_agg is a DataFrame

def my_agg(x):     names = {         'Amount mean': x['Amount'].mean(),         'Amount std':  x['Amount'].std(),         'Amount range': x['Amount'].max() - x['Amount'].min(),         'Score Max':  x['Score'].max(),         'Score Sum': x['Score'].sum(),         'Amount Score Sum': (x['Amount'] * x['Score']).sum()}      return pd.Series(names, index=['Amount range', 'Amount std', 'Amount mean',                                    'Score Sum', 'Score Max', 'Amount Score Sum'])

Pass this custom function to the groupby apply method

df.groupby('User').apply(my_agg)

enter image description here

The big downside is that this function will be much slower than agg for the cythonized aggregations

Using a dictionary with groupby `agg` method

Using a dictionary of dictionaries was removed because of its complexity and somewhat ambiguous nature. There is an ongoing discussion on how to improve this functionality in the future on github Here, you can directly access the aggregating column after the groupby call. Simply pass a list of all the aggregating functions you wish to apply.

df.groupby('User')['Amount'].agg(['sum', 'count'])

Output

       sum  count User               user1  18.0      2 user2  20.5      3 user3  10.5      1

It is still possible to use a dictionary to explicitly denote different aggregations for different columns, like here if there was another numeric column named Other.

df = pd.DataFrame({"User": ["user1", "user2", "user2", "user3", "user2", "user1"],               "Amount": [10.0, 5.0, 8.0, 10.5, 7.5, 8.0],               'Other': [1,2,3,4,5,6]})  df.groupby('User').agg({'Amount' : ['sum', 'count'], 'Other':['max', 'std']})

Output

      Amount       Other                    sum count   max       std User                               user1   18.0     2     6  3.535534 user2   20.5     3     5  1.527525 user3   10.5     1     4       NaN

121

answered Oct 09 '22 08:10

Ted Petrou

Related questions
                            
                                Retrieving Data from SQL Using pyodbc
                            
                                Checking email with Python
                            
                                Does python have a "use strict;" and "use warnings;" like in perl?
                            
                                Cross-platform way to get PIDs by process name in python
                            
                                How can I use a pre-trained neural network with grayscale images?
                            
                                Line plot with data points in pandas
                            
                                Can't pickle defaultdict
                            
                                "freeze" some variables/scopes in tensorflow: stop_gradient vs passing variables to minimize
                            
                                How can I tell whether my Django application is running on development server or not?
                            
                                Sharing a complex object between processes?
                            
                                Python Class Members
                            
                                How do I remove the microseconds from a timedelta object?
                            
                                Pandas: conditional rolling count
                            
                                Does Python have a built-in function for unindenting a multiline string?
                            
                                Changing the text on a label
                            
                                numpy.unique with order preserved
                            
                                How to write to .txt files in Python 3
                            
                                Is it necessary to include __init__ as the first function every time in a class in Python?
                            
                                Creating seed data in a flask-migrate or alembic migration
                            
                                ValueError: Grouper for <something> not 1-dimensional

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Tags:

python

pandas

rename

aggregate

Victor Mayrink

People also ask

1 Answers

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method

Ted Petrou

Recent Activity

Donate For Us

Rename result columns from Pandas aggregation ("FutureWarning: using a dict with renaming is deprecated")

Tags:

python

pandas

rename

aggregate

Victor Mayrink

People also ask

1 Answers

Use groupby apply and return a Series to rename columns

Using a dictionary with groupby agg method

Ted Petrou

Related questions

Recent Activity

Donate For Us

Use groupby `apply` and return a Series to rename columns

Using a dictionary with groupby `agg` method