Is there away to specify to the <code>groupby()</code> call to use the group name in the <code>apply()</code> lambda function? Similar to if I iterate through groups I can get the group key via the following tuple decomposition: <pre class="prettyprint"><code>for group_name, subdf in temp_dataframe.groupby(level=0, axis=0): print group_name </code></pre> ...is there a way to also get the group name in the apply function, such as: <pre class="prettyprint"><code>temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf) </code></pre> How can I get the group name as an argument for the apply lambda function?

I think you should be able to use the <code>name</code>attribute: <pre class="prettyprint"><code>temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x)) </code></pre> should work, example: <pre class="prettyprint"><code>In [132]: df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)}) df Out[132]: a b 0 a 0 1 a 1 2 b 2 3 c 3 4 c 4 5 c 5 In [134]: df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x)) name: a subdf: a b 0 a 0 1 a 1 name: b subdf: a b 2 b 2 name: c subdf: a b 3 c 3 4 c 4 5 c 5 Out[134]: Empty DataFrame Columns: [] Index: [] </code></pre>

For those who came looking for an answer to the question: <blockquote> Including the group name in the transform function pandas python </blockquote> and ended up in this thread, please read on. Given the following input: <pre class="prettyprint"><code>df = pd.DataFrame(data={'col1': list('aabccc'), 'col2': np.arange(6), 'col3': np.arange(6)}) </code></pre> Data: <pre class="prettyprint"><code> col1 col2 col3 0 a 0 0 1 a 1 1 2 b 2 2 3 c 3 3 4 c 4 4 5 c 5 5 </code></pre> We can access the group name (which is visible from the scope of the calling apply function) like this: <pre class="prettyprint"><code>df.groupby('col1') \ .apply(lambda frame: frame \ .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col)) </code></pre> Output: <pre class="prettyprint"><code> col1 col2 col3 0 a 3 0 1 a 4 1 2 b 2 2 3 c 3 3 4 c 4 4 5 c 5 5 </code></pre> Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name. Alternatively, one could also loop over the groups and then, within each group, over the columns: <pre class="prettyprint"><code>for grp_name, sub_df in df.groupby('col1'): for col in sub_df: if grp_name == 'a' and col == 'col2': df.loc[df.col1 == grp_name, col] = sub_df[col] + 3 </code></pre> My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.

Including the group name in the apply function pandas python

Tags:

python

pandas

pandas-groupby

apply

Is there away to specify to the groupby() call to use the group name in the apply() lambda function?

Similar to if I iterate through groups I can get the group key via the following tuple decomposition:

for group_name, subdf in temp_dataframe.groupby(level=0, axis=0):     print group_name

...is there a way to also get the group name in the apply function, such as:

temp_dataframe.groupby(level=0,axis=0).apply(lambda group_name, subdf: foo(group_name, subdf)

How can I get the group name as an argument for the apply lambda function?

596

asked Sep 08 '15 14:09

user1129988

2 Answers

I think you should be able to use the nameattribute:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

should work, example:

In [132]: df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)}) df  Out[132]:    a  b 0  a  0 1  a  1 2  b  2 3  c  3 4  c  4 5  c  5  In [134]: df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))  name: a  subdf:    a  b 0  a  0 1  a  1 name: b  subdf:    a  b 2  b  2 name: c  subdf:    a  b 3  c  3 4  c  4 5  c  5 Out[134]: Empty DataFrame Columns: [] Index: []

144

answered Oct 08 '22 08:10

EdChum

For those who came looking for an answer to the question:

Including the group name in the transform function pandas python

and ended up in this thread, please read on.

Given the following input:

df = pd.DataFrame(data={'col1': list('aabccc'),                         'col2': np.arange(6),                         'col3': np.arange(6)})

Data:

    col1    col2    col3 0   a       0       0 1   a       1       1 2   b       2       2 3   c       3       3 4   c       4       4 5   c       5       5

We can access the group name (which is visible from the scope of the calling apply function) like this:

df.groupby('col1') \ .apply(lambda frame: frame \        .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col))

Output:

    col1    col2    col3 0   a       3       0 1   a       4       1 2   b       2       2 3   c       3       3 4   c       4       4 5   c       5       5

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

Alternatively, one could also loop over the groups and then, within each group, over the columns:

for grp_name, sub_df in df.groupby('col1'):     for col in sub_df:         if grp_name == 'a' and col == 'col2':             df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.

answered Oct 08 '22 08:10

rapture

Related questions
                            
                                Error 'failed to load external entity' when using Python lxml
                            
                                Adapt an iterator to behave like a file-like object in Python
                            
                                Why is the quality of JPEG images produced by PIL so poor?
                            
                                pyyaml is producing undesired !!python/unicode output
                            
                                Flask app logger not working when running within gunicorn
                            
                                Is it possible to split a sequence of pandas commands across multiple lines?
                            
                                What does "e" in "1e-5" in Python language mean and what is the name of this notation? [duplicate]
                            
                                AttributeError: module 'sys' has no attribute 'maxint'
                            
                                What's the meaning of the percentages displayed for each test on PyTest?
                            
                                Missing optional dependency 'tables'. In pandas to_hdf
                            
                                Django: check whether an object already exists before adding
                            
                                replacing all regex matches in single line
                            
                                How to fix Unicode encode error using the hashlib module?
                            
                                Automatic screenshots when test fail by Selenium Webdriver in Python
                            
                                Cannot import scikits-learn even though it seems to be installed
                            
                                Triple-double quote v.s. Double quote
                            
                                Fetch table values using alembic and update to another table.
                            
                                ImportError: No module named redis
                            
                                Shifted colorbar matplotlib
                            
                                Is it a good practice to nest classes? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With