Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Set column name for apply result over groupby

Tags:

python

pandas

This is a fairly trivial problem, but its triggering my OCD and I haven't been able to find a suitable solution for the past half hour.

For background, I'm looking to calculate a value (let's call it F) for each group in a DataFrame derived from different aggregated measures of columns in the existing DataFrame.

Here's a toy example of what I'm trying to do:

import pandas as pd import numpy as np  df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'],                 'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'],                 'C': [69, 83, 28, 25, 11, 31, 14, 37, 14,  0],                 'D': [ 0.3,  0.1,  0.1,  0.8,  0.8,  0. ,  0.8,  0.8,  0.1,  0.8],                 'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12]                 })  df_grp = df.groupby(['A','B']) df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()) 

What I'd like to do is assign a name to the result of apply (or lambda). Is there anyway to do this without moving lambda to a named function or renaming the column after running the last line?

like image 782
MrT Avatar asked Apr 22 '15 15:04

MrT


People also ask

How do I change the column name after Groupby in pandas?

The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename method. See this deprecation note in the documentation for more detail.

How do I get Groupby columns in pandas?

You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.

What is possible using Groupby () method of pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.


2 Answers

Have the lambda function return a new Series:

df_grp.apply(lambda x: pd.Series({'new_name':                     x['C'].sum() * x['D'].mean() / x['E'].max()})) # or df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).to_frame('new_name')       new_name A B           X N  5.583333 Y M  2.975000   N  3.845455 
like image 178
Alexander Avatar answered Sep 21 '22 08:09

Alexander


You could convert your series to a dataframe using reset_index() and provide name='yout_col_name' -- The name of the column corresponding to the Series values

(df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())       .reset_index(name='your_col_name'))     A  B  your_col_name 0  X  N   5.583333 1  Y  M   2.975000 2  Y  N   3.845455 
like image 25
Zero Avatar answered Sep 20 '22 08:09

Zero