Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame Apply

Tags:

python

pandas

I have a Pandas DataFrame with four columns, A, B, C, D. It turns out that, sometimes, the values of B and C can be 0. I therefore wish to obtain the following:

B[i] = B[i] if B[i] else min(A[i], D[i])
C[i] = C[i] if C[i] else max(A[i], D[i])

where I have used i to indicate a run over all rows of the frame. With Pandas it is easy to find the rows which contain zero columns:

df[df.B == 0] and df[df.C == 0]

however I have no idea how to easily perform the above transformation. I can think of various inefficient and inelegant methods (for loops over the entire frame) but nothing simple.

like image 514
Freddie Witherden Avatar asked Aug 03 '12 11:08

Freddie Witherden


People also ask

What does apply () do in pandas?

The apply() method allows you to apply a function along one of the axis of the DataFrame, default 0, which is the index (row) axis.

What is DataFrame apply?

Pandas DataFrame apply() function is used to apply a function along an axis of the DataFrame. The function syntax is: def apply( self, func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds )

Is DataFrame apply inplace?

No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.


2 Answers

A combination of boolean indexing and apply can do the trick. Below an example on replacing zero element for column C.

In [22]: df
Out[22]:
   A  B  C  D
0  8  3  5  8
1  9  4  0  4
2  5  4  3  8
3  4  8  5  1

In [23]: bi = df.C==0

In [24]: df.ix[bi, 'C'] = df[bi][['A', 'D']].apply(max, axis=1)

In [25]: df
Out[25]:
   A  B  C  D
0  8  3  5  8
1  9  4  9  4
2  5  4  3  8
3  4  8  5  1
like image 64
Wouter Overmeire Avatar answered Nov 11 '22 05:11

Wouter Overmeire


Try 'iterrows' DataFrame class method for efficiently iterating through the rows of a DataFrame.See chapter 6.7.2 of the pandas 0.8.1 guide.

from pandas import *
import numpy as np

df = DataFrame({'A' : [5,6,3], 'B' : [0,0,0], 'C':[0,0,0], 'D' : [3,4,5]})

for idx, row in df.iterrows():
    if row['B'] == 0:
        row['B'] = min(row['A'], row['D'])
    if row['C'] == 0:
        row['C'] = min(row['A'], row['D'])
like image 45
THM Avatar answered Nov 11 '22 04:11

THM