How to apply multiple functions to a groupby object

Tags:

For example, I have two lambda functions to apply to a grouped data frame:

df.groupby(['A', 'B']).apply(lambda g: ...)
df.groupby(['A', 'B']).apply(lambda g: ...)

Both would work, but not when combined:

df.groupby(['A', 'B']).apply([lambda g: ..., lambda g: ...])

Why is that? How can I apply different functions to a grouped object and get each result concatenated column wise together?

Is there a way not to specify some column to a function? All you have suggested seemed to only work with certain columns.

982

asked Jun 02 '17 15:06

James Wong

1 Answers

This is a good opportunity to highlight one of the changes in pandas 0.20

Deprecate groupby.agg() with a dictionary when renaming

What does this mean?
Consider the dataframe df

df = pd.DataFrame(dict(
        A=np.tile([1, 2], 2).repeat(2),
        B=np.repeat([1, 2], 2).repeat(2),
        C=np.arange(8)
    ))
df

   A  B  C
0  1  1  0
1  1  1  1
2  2  1  2
3  2  1  3
4  1  2  4
5  1  2  5
6  2  2  6
7  2  2  7

We could previously do

df.groupby(['A', 'B']).C.agg(dict(f1=lambda x: x.size, f2=lambda x: x.max()))

     f1  f2
A B        
1 1   2   1
  2   2   5
2 1   2   3
  2   2   7

And our names 'f1' and 'f2' were placed as column headers. However, with pandas 0.20 I get this

//anaconda/envs/3.6/lib/python3.6/site-packages/ipykernel/__main__.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version
  if __name__ == '__main__':

So what does that mean? What if I do two lambdas without the naming dictionary?

df.groupby(['A', 'B']).C.agg([lambda x: x.size, lambda x: x.max()])

---------------------------------------------------------------------------
SpecificationError                        Traceback (most recent call last)
<ipython-input-398-fc26cf466812> in <module>()
----> 1 print(df.groupby(['A', 'B']).C.agg([lambda x: x.size, lambda x: x.max()]))

//anaconda/envs/3.6/lib/python3.6/site-packages/pandas/core/groupby.py in aggregate(self, func_or_funcs, *args, **kwargs)
   2798         if hasattr(func_or_funcs, '__iter__'):
   2799             ret = self._aggregate_multiple_funcs(func_or_funcs,
-> 2800                                                  (_level or 0) + 1)
   2801         else:
   2802             cyfunc = self._is_cython_func(func_or_funcs)

//anaconda/envs/3.6/lib/python3.6/site-packages/pandas/core/groupby.py in _aggregate_multiple_funcs(self, arg, _level)
   2863             if name in results:
   2864                 raise SpecificationError('Function names must be unique, '
-> 2865                                          'found multiple named %s' % name)
   2866 
   2867             # reset the cache so that we

SpecificationError: Function names must be unique, found multiple named <lambda>

pandas errors on multiple columns named '<lambda>'

Solution: Name your functions

def f1(x):
    return x.size

def f2(x):
    return x.max()

df.groupby(['A', 'B']).C.agg([f1, f2])

     f1  f2
A B        
1 1   2   1
  2   2   5
2 1   2   3
  2   2   7

answered Oct 21 '22 11:10

piRSquared

Related questions
                            
                                Does numpy provide a generalized inner product?
                            
                                How to find indices of a reordered numpy array?
                            
                                How to feed back RNN output to input in tensorflow
                            
                                Overwriting Nan values with .loc in Pandas [duplicate]
                            
                                the difference between multiprocessing.sharedctypes.Value and multiprocessing.Value in python
                            
                                How to use created variable in same assign function with pandas
                            
                                How do I get a files absolute path after being uploaded in Django?
                            
                                Getting more than 100 search results with PRAW?
                            
                                Displaying opencv image using python flask
                            
                                Prime numbers generator explanation? [duplicate]
                            
                                Uncomfortable output of mode() in pandas Dataframe
                            
                                Hard coding confidence interval as whiskers in bar plot
                            
                                matplotlib funcanimation update function is called twice for first argument
                            
                                Many to many sequence prediction with different sequence length
                            
                                How do you scale a design resolution to other resolutions with Pygame?
                            
                                How to get indexes of k maximum values from a numpy multidimensional array
                            
                                Python 3 hash HMAC-SHA512 [duplicate]
                            
                                How to build Python 3.4.6 from source?
                            
                                Any way to do integer division in sympy?
                            
                                Save user input after certain message telegram bot

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to apply multiple functions to a groupby object

Tags:

python

pandas

dataframe

James Wong

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us