How to use groupby to apply multiple functions to multiple columns in Pandas?

Tags:

I have a normal df

A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
                 columns=['A', 'B', 'C'], index=[1, 2, 3, 4, 5])

Following this recipe, I got the the results I wanted.

In [62]: A.groupby((A['A'] > 2)).apply(lambda x: pd.Series(dict(
                   up_B=(x.B >= 0).sum(), down_B=(x.B < 0).sum(), mean_B=(x.B).mean(), std_B=(x.B).std(),
                   up_C=(x.C >= 0).sum(), down_C=(x.C < 0).sum(), mean_C=(x.C).mean(), std_C=(x.C).std())))

Out[62]:
       down_B  down_C  mean_B    mean_C     std_B     std_C  up_B  up_C
A                                                                      
False       0       0     4.5  3.000000  0.707107  1.414214     2     2
True        0       0     2.0  2.333333  1.000000  1.527525     3     3

This approach is fine, but imagine you had to do this for a large number of columns (15-100), then you have to type all that stuff in the formula, which can be cumbersome.

Given that the same formulas are applied to ALL columns. Is there an efficient way to do this for a large number of columns?.

Thanks

906

asked Oct 05 '14 19:10

hernanavella

1 Answers

Since you are aggregating each grouped column into one value, you can use agg instead of apply. The agg method can take a list of functions as input. The functions will be applied to each column:

def up(x):
    return (x >= 0).sum()
def down(x):
    return (x < 0).sum()

result = A.loc[:, 'B':'C'].groupby((A['A'] > 2)).agg(
             [up, down, 'mean', 'std'])
print(result)

yields

       B                      C                         
      up down mean       std up down      mean       std
A                                                       
False  2    0  4.5  0.707107  2    0  3.000000  1.414214
True   3    0  2.0  1.000000  3    0  2.333333  1.527525

result has hierarchical ("MultiIndexed") columns. To select a certain column (or columns), you could use:

In [39]: result['B','mean']
Out[39]: 
A
False    4.5
True     2.0
Name: (B, mean), dtype: float64

In [46]: result[[('B', 'mean'), ('C', 'mean')]]
Out[46]: 
         B         C
      mean      mean
A                   
False  4.5  3.000000
True   2.0  2.333333

or you could move one level of the MultiIndex to the index:

In [40]: result.stack()
Out[40]: 
                   B         C
A                             
False up    2.000000  2.000000
      down  0.000000  0.000000
      mean  4.500000  3.000000
      std   0.707107  1.414214
True  up    3.000000  3.000000
      down  0.000000  0.000000
      mean  2.000000  2.333333
      std   1.000000  1.527525

106

answered Nov 02 '22 04:11

unutbu

Related questions
                            
                                Overlay two numpy arrays treating fourth plane as alpha level [duplicate]
                            
                                Why manual string reverse is worse than slice reverse in Python 2.7? What is the algorithm being used in Slice?
                            
                                PIL Clipboard Image to Base64 string
                            
                                How to match anything (DOTALL) without DOTALL?
                            
                                Kivy official Pong tutorial - usage of Vector (kivy.vector)
                            
                                Non-overlapping scatter plot labels using matplotlib
                            
                                Returning nearby locations in Django
                            
                                Python ignores default values of arguments supplied to tuple in inherited class
                            
                                XHR request URL says does not exist when attempting to parse it's content
                            
                                Understanding len function with iterators
                            
                                Find class in which a method is defined
                            
                                Remove lines in dataframe using a list in Pandas
                            
                                add frame and remove background colour and grids using seaborn.kdeplot
                            
                                Python does not show code changes from imported file
                            
                                how to convert N space separated numbers into in array in python? [closed]
                            
                                pandas: Convert string column to ordered Category?
                            
                                How to access data stored in QModelIndex
                            
                                Gradle Exec task and process output
                            
                                What's a fast and pythonic/clean way of removing a sorted list from another sorted list in python?
                            
                                How to convert IETF BCP 47 language identifier to ISO-639-2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use groupby to apply multiple functions to multiple columns in Pandas?

Tags:

python

pandas

dataframe

group-by

hernanavella

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us