I had asked this question before: python pandas: applying different aggregate functions to different columns but the latest changes to pandas https://github.com/pandas-dev/pandas/pull/15931 mean that what I thought was an elegant and pythonic solution is deprecated, for reasons I genuinely fail to understand.
The question was, and still is: when doing a groupby, how can I apply different aggregate functions to different fields (e.g. sum of x, avg of x, min of y, max of z, etc.) and rename the resulting fields, all in one go, or at least in a possibly pythonic and not-too-cumbersome way? I.e. sum_x won't do, I need to rename the fields explicitly.
This approach, which I liked:
df.groupby('qtr').agg({"realgdp": {"mean_gdp": "mean", "std_gdp": "std"},
"unemp": {"mean_unemp": "mean"}})
will be deprecated and now produces this warning:
FutureWarning: using a dict with renaming is deprecated and will be removed in a future version
Thanks!
agg() is not deprecated but renaming using agg is.
Do go through the documentation: https://pandas.pydata.org/pandas-docs/stable/whatsnew.html#deprecate-groupby-agg-with-a-dictionary-when-renaming
What is deprecated: 1. Passing a dict to a grouped/rolled/resampled Series that allowed one to rename the resulting aggregation 2. Passing a dict-of-dicts to a grouped/rolled/resampled DataFrame.
This will work, though its not a single line of code
df.groupby('qtr').agg({"realgdp": ["mean", "std"], "unemp": "mean"})
df.columns = df.columns.map('_'.join)
df.rename(columns = {'realgdp_mean': 'mean_gdp', 'realgdp_std':'std_gdp', 'unemp_mean':'mean_unemp'}, inplace = True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With