Python Pandas Conditional Sum with Groupby

Tags:

Using sample data:

df = pd.DataFrame({'key1' : ['a','a','b','b','a'],                'key2' : ['one', 'two', 'one', 'two', 'one'],                'data1' : np.random.randn(5),                'data2' : np. random.randn(5)})

    data1        data2     key1  key2 0    0.361601    0.375297    a   one 1    0.069889    0.809772    a   two 2    1.468194    0.272929    b   one 3   -1.138458    0.865060    b   two 4   -0.268210    1.250340    a   one

I'm trying to figure out how to group the data by key1 and sum only the data1 values where key2 equals 'one'.

Here's what I've tried

def f(d,a,b):     d.ix[d[a] == b, 'data1'].sum()  df.groupby(['key1']).apply(f, a = 'key2', b = 'one').reset_index()

But this gives me a dataframe with 'None' values

index   key1    0 0       a       None 1       b       None

Any ideas here? I'm looking for the Pandas equivalent of the following SQL:

SELECT Key1, SUM(CASE WHEN Key2 = 'one' then data1 else 0 end) FROM df GROUP BY key1

FYI - I've seen conditional sums for pandas aggregate but couldn't transform the answer provided there to work with sums rather than counts.

Thanks in advance

520

asked Jun 23 '13 23:06

AllenQ

2 Answers

First groupby the key1 column:

In [11]: g = df.groupby('key1')

and then for each group take the subDataFrame where key2 equals 'one' and sum the data1 column:

In [12]: g.apply(lambda x: x[x['key2'] == 'one']['data1'].sum()) Out[12]: key1 a       0.093391 b       1.468194 dtype: float64

To explain what's going on let's look at the 'a' group:

In [21]: a = g.get_group('a')  In [22]: a Out[22]:       data1     data2 key1 key2 0  0.361601  0.375297    a  one 1  0.069889  0.809772    a  two 4 -0.268210  1.250340    a  one  In [23]: a[a['key2'] == 'one'] Out[23]:       data1     data2 key1 key2 0  0.361601  0.375297    a  one 4 -0.268210  1.250340    a  one  In [24]: a[a['key2'] == 'one']['data1'] Out[24]: 0    0.361601 4   -0.268210 Name: data1, dtype: float64  In [25]: a[a['key2'] == 'one']['data1'].sum() Out[25]: 0.093391000000000002

It may be slightly easier/clearer to do this by restricting the dataframe to just those with key2 equals one first:

In [31]: df1 = df[df['key2'] == 'one']  In [32]: df1 Out[32]:       data1     data2 key1 key2 0  0.361601  0.375297    a  one 2  1.468194  0.272929    b  one 4 -0.268210  1.250340    a  one  In [33]: df1.groupby('key1')['data1'].sum() Out[33]: key1 a       0.093391 b       1.468194 Name: data1, dtype: float64

132

answered Oct 12 '22 08:10

Andy Hayden

I think that today with pandas 0.23 you can do this:

import numpy as np   df.assign(result = np.where(df['key2']=='one',df.data1,0))\    .groupby('key1').agg({'result':sum})

The advantage of this is that you can apply it to more than one column of the same dataframe

df.assign(  result1 = np.where(df['key2']=='one',df.data1,0),  result2 = np.where(df['key2']=='two',df.data1,0)   ).groupby('key1').agg({'result1':sum, 'result2':sum})

answered Oct 12 '22 07:10

Diego

Related questions
                            
                                Taking subarrays from numpy array with given stride/stepsize
                            
                                Django - run a function every x seconds
                            
                                How can I bypass the Google CAPTCHA with Selenium and Python?
                            
                                Full command line as it was typed
                            
                                Python - import in if
                            
                                Why is if True slower than if 1?
                            
                                Efficiently create sparse pivot tables in pandas?
                            
                                Pandas KeyError: value not in index
                            
                                Can I omit Optional if I set default to None?
                            
                                What class to use for money representation?
                            
                                graph rendering in python (flowchart visualization) [closed]
                            
                                Running maximum of numpy array values
                            
                                How to create datetime object from "16SEP2012" in python
                            
                                What does ,= mean in python?
                            
                                Set Max value for color bar on seaborn heatmap
                            
                                in Windows 10, How to configure Visual Studio Code to find the Python 3 interpreter?
                            
                                How to install python3.7 and create a virtualenv with pip on Ubuntu 18.04?
                            
                                How to auto insert the current user when creating an object in django admin?
                            
                                Fabric's cd context manager does not work
                            
                                Do Python regular expressions have an equivalent to Ruby's atomic grouping?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas Conditional Sum with Groupby

Tags:

python

pandas

pandas-groupby

AllenQ

People also ask

2 Answers

Andy Hayden

Diego

Recent Activity

Donate For Us