Is there away to specify to the groupby()
call to use the group name in the apply()
lambda function?
Similar to if I iterate through groups I can get the group key via the following tuple decomposition:
for group_name, subdf in temp_dataframe.groupby(level=0, axis=0): print group_name
...is there a way to also get the group name in the apply function, such as:
temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)
How can I get the group name as an argument for the apply lambda function?
Apply function func group-wise and combine the results together. The function passed to apply must take a dataframe as its first argument and return a dataframe, a series or a scalar. apply will then take care of combining the results back together into a single dataframe or series.
The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.
Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.
I think you should be able to use the name
attribute:
temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))
should work, example:
In [132]: df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)}) df Out[132]: a b 0 a 0 1 a 1 2 b 2 3 c 3 4 c 4 5 c 5 In [134]: df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x)) name: a subdf: a b 0 a 0 1 a 1 name: b subdf: a b 2 b 2 name: c subdf: a b 3 c 3 4 c 4 5 c 5 Out[134]: Empty DataFrame Columns: [] Index: []
For those who came looking for an answer to the question:
Including the group name in the transform function pandas python
and ended up in this thread, please read on.
Given the following input:
df = pd.DataFrame(data={'col1': list('aabccc'), 'col2': np.arange(6), 'col3': np.arange(6)})
Data:
col1 col2 col3 0 a 0 0 1 a 1 1 2 b 2 2 3 c 3 3 4 c 4 4 5 c 5 5
We can access the group name (which is visible from the scope of the calling apply function) like this:
df.groupby('col1') \ .apply(lambda frame: frame \ .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col))
Output:
col1 col2 col3 0 a 3 0 1 a 4 1 2 b 2 2 3 c 3 3 4 c 4 4 5 c 5 5
Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.
Alternatively, one could also loop over the groups and then, within each group, over the columns:
for grp_name, sub_df in df.groupby('col1'): for col in sub_df: if grp_name == 'a' and col == 'col2': df.loc[df.col1 == grp_name, col] = sub_df[col] + 3
My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With