This is a fairly trivial problem, but its triggering my OCD and I haven't been able to find a suitable solution for the past half hour.
For background, I'm looking to calculate a value (let's call it F) for each group in a DataFrame derived from different aggregated measures of columns in the existing DataFrame.
Here's a toy example of what I'm trying to do:
import pandas as pd import numpy as np df = pd.DataFrame({'A': ['X', 'Y', 'X', 'Y', 'Y', 'Y', 'Y', 'X', 'Y', 'X'], 'B': ['N', 'N', 'N', 'M', 'N', 'M', 'M', 'N', 'M', 'N'], 'C': [69, 83, 28, 25, 11, 31, 14, 37, 14, 0], 'D': [ 0.3, 0.1, 0.1, 0.8, 0.8, 0. , 0.8, 0.8, 0.1, 0.8], 'E': [11, 11, 12, 11, 11, 12, 12, 11, 12, 12] }) df_grp = df.groupby(['A','B']) df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max())
What I'd like to do is assign a name to the result of apply
(or lambda
). Is there anyway to do this without moving lambda
to a named function or renaming the column after running the last line?
The current (as of version 0.20) method for changing column names after a groupby operation is to chain the rename method. See this deprecation note in the documentation for more detail.
You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
Have the lambda function return a new Series:
df_grp.apply(lambda x: pd.Series({'new_name': x['C'].sum() * x['D'].mean() / x['E'].max()})) # or df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()).to_frame('new_name') new_name A B X N 5.583333 Y M 2.975000 N 3.845455
You could convert your series
to a dataframe
using reset_index()
and provide name='yout_col_name'
-- The name of the column corresponding to the Series values
(df_grp.apply(lambda x: x['C'].sum() * x['D'].mean() / x['E'].max()) .reset_index(name='your_col_name')) A B your_col_name 0 X N 5.583333 1 Y M 2.975000 2 Y N 3.845455
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With