Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sum negative and positive values separately when using groupby in pandas?

How to sum positive and negative values differently in pandas and put them let's say in positive and negative columns?

I have this dataframe like below:

df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8), 'D' : np.random.randn(8)})

Output is as below:

df
     A      B         C         D
0  foo    one  0.374156  0.319699
1  bar    one -0.356339 -0.629649
2  foo    two -0.390243 -1.387909
3  bar  three -0.783435 -0.959699
4  foo    two -1.268622 -0.250871
5  bar    two -2.302525 -1.295991
6  foo    one -0.968840  1.247675
7  foo  three  0.482845  1.004697

I used the below code to get negatives:

df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]

But the problem is when I want to add it to one of dataframe columns called negative it gives error:

ValueError: Wrong number of items passed 2, placement implies 1

Again I know what it says that groupby has returned more than one column and cannot assign it to df['negatives'] but I don't know how to solve this part of the problem. I need to have positive col too.

The desired outcome would be:

    A      Positive   Negative
0  foo     0.374156  -0.319699
1  bar     0.356339  -0.629649

What is the right solution to the problem?

like image 396
Alireza Avatar asked Nov 18 '15 13:11

Alireza


2 Answers

In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
     negative   positive
A       
bar -1.418788   2.603452
foo -0.504695   2.880512
like image 71
Nader Hisham Avatar answered Oct 23 '22 09:10

Nader Hisham


You may groupby on A and df['C'] > 0, and unstack the result:

>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C    negative  positive
A                      
bar   -3.4423       NaN
foo   -2.6277     0.857

The NaN value is because all the A == bar rows have negative value for C.

if you want to add these to the original frame corresponding to values of groupby key, i.e. A, it would require a left join:

>>> df.join(right, on='A', how='left')
     A      B       C       D  negative  positive
0  foo    one  0.3742  0.3197   -2.6277     0.857
1  bar    one -0.3563 -0.6296   -3.4423       NaN
2  foo    two -0.3902 -1.3879   -2.6277     0.857
3  bar  three -0.7834 -0.9597   -3.4423       NaN
4  foo    two -1.2686 -0.2509   -2.6277     0.857
5  bar    two -2.3025 -1.2960   -3.4423       NaN
6  foo    one -0.9688  1.2477   -2.6277     0.857
7  foo  three  0.4828  1.0047   -2.6277     0.857
like image 5
behzad.nouri Avatar answered Oct 23 '22 10:10

behzad.nouri