Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying different aggregate functions to different columns (now that dict with renaming is deprecated)

I had asked this question before: python pandas: applying different aggregate functions to different columns but the latest changes to pandas https://github.com/pandas-dev/pandas/pull/15931 mean that what I thought was an elegant and pythonic solution is deprecated, for reasons I genuinely fail to understand.

The question was, and still is: when doing a groupby, how can I apply different aggregate functions to different fields (e.g. sum of x, avg of x, min of y, max of z, etc.) and rename the resulting fields, all in one go, or at least in a possibly pythonic and not-too-cumbersome way? I.e. sum_x won't do, I need to rename the fields explicitly.

This approach, which I liked:

df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
                                "unemp": {"mean_unemp": "mean"}})

will be deprecated and now produces this warning:

FutureWarning: using a dict with renaming is deprecated and will be removed in a future version

Thanks!

like image 549
Pythonista anonymous Avatar asked Oct 17 '22 04:10

Pythonista anonymous


1 Answers

agg() is not deprecated but renaming using agg is.

Do go through the documentation: https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming

What is deprecated: 1. Passing a dict to a grouped/rolled/resampled Series that allowed one to rename the resulting aggregation 2. Passing a dict-of-dicts to a grouped/rolled/resampled DataFrame.

This will work, though its not a single line of code

df.groupby('qtr').agg({"realgdp": ["mean",  "std"], "unemp": "mean"})

df.columns = df.columns.map('_'.join)

df.rename(columns = {'realgdp_mean': 'mean_gdp', 'realgdp_std':'std_gdp', 'unemp_mean':'mean_unemp'}, inplace = True)
like image 135
Vaishali Avatar answered Oct 20 '22 23:10

Vaishali