Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python using lambda to apply pd.DataFrame instead for nested loop is it possible?

I'm trying to avoid nested loop in python here by using lambda apply to create a new column using this argument below :

from pandas import *
import pandas as pd    
df = pd.DataFrame((np.random.rand(100, 4)*100), columns=list('ABCD'))
df['C'] = df.apply(lambda A,B: A+B)

TypeError: ('() takes exactly 2 arguments (1 given)', u'occurred at index A')

Obviously this doesn't work any recommendation ?

like image 966
JPC Avatar asked Oct 04 '13 10:10

JPC


1 Answers

Do you want to add column A and column B and store the result in C? Then you can have it simpler:

df.C = df.A + df.B

As @EdChum points out in the comment, the argument to the function in apply is a series, by default on axis 0 which are rows (axis 1 means columns):

>>> df.apply(lambda s: s)[:3]
           A          B          C          D
0  57.890858  72.344298  16.348960  84.109071
1  85.534617  53.067682  95.212719  36.677814
2  23.202907   3.788458  66.717430   1.466331

Here, we add the first and the second row:

>>> df.apply(lambda s: s[0] + s[1])
A    143.425475
B    125.411981
C    111.561680
D    120.786886
dtype: float64

To work on columns, use axis=1 keyword parameter:

>>> df.apply(lambda s: s[0] + s[1], axis=1)
0     130.235156
1     138.602299
2      26.991364
3     143.229523
...
98    152.640811
99     90.266934

Which yield the same result as referring to the columns by name:

>>> (df.apply(lambda s: s[0] + s[1], axis=1) == 
     df.apply(lambda s: s['A'] + s['B'], axis=1))
0     True
1     True
2     True
3     True
...
98    True
99    True
like image 72
miku Avatar answered Nov 15 '22 00:11

miku