Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Including the group name in the apply function pandas python

Is there away to specify to the groupby() call to use the group name in the apply() lambda function?

Similar to if I iterate through groups I can get the group key via the following tuple decomposition:

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):     print group_name 

...is there a way to also get the group name in the apply function, such as:

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf) 

How can I get the group name as an argument for the apply lambda function?

like image 596
user1129988 Avatar asked Sep 08 '15 14:09

user1129988


People also ask

How does Groupby apply work?

Apply function func group-wise and combine the results together. The function passed to apply must take a dataframe as its first argument and return a dataframe, a series or a scalar. apply will then take care of combining the results back together into a single dataframe or series.

How do you name a group in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation.

How do I use custom function on Groupby pandas?

Simply use the apply method to each dataframe in the groupby object. This is the most straightforward way and the easiest to understand. Notice that the function takes a dataframe as its only argument, so any code within the custom function needs to work on a pandas dataframe.


2 Answers

I think you should be able to use the nameattribute:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x)) 

should work, example:

In [132]: df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)}) df  Out[132]:    a  b 0  a  0 1  a  1 2  b  2 3  c  3 4  c  4 5  c  5  In [134]: df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))  name: a  subdf:    a  b 0  a  0 1  a  1 name: b  subdf:    a  b 2  b  2 name: c  subdf:    a  b 3  c  3 4  c  4 5  c  5 Out[134]: Empty DataFrame Columns: [] Index: [] 
like image 144
EdChum Avatar answered Oct 08 '22 08:10

EdChum


For those who came looking for an answer to the question:

Including the group name in the transform function pandas python

and ended up in this thread, please read on.

Given the following input:

df = pd.DataFrame(data={'col1': list('aabccc'),                         'col2': np.arange(6),                         'col3': np.arange(6)}) 

Data:

    col1    col2    col3 0   a       0       0 1   a       1       1 2   b       2       2 3   c       3       3 4   c       4       4 5   c       5       5 

We can access the group name (which is visible from the scope of the calling apply function) like this:

df.groupby('col1') \ .apply(lambda frame: frame \        .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col)) 

Output:

    col1    col2    col3 0   a       3       0 1   a       4       1 2   b       2       2 3   c       3       3 4   c       4       4 5   c       5       5 

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

Alternatively, one could also loop over the groups and then, within each group, over the columns:

for grp_name, sub_df in df.groupby('col1'):     for col in sub_df:         if grp_name == 'a' and col == 'col2':             df.loc[df.col1 == grp_name, col] = sub_df[col] + 3 

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.

like image 36
rapture Avatar answered Oct 08 '22 08:10

rapture