I'm trying to avoid nested loop in python here by using lambda apply to create a new column using this argument below :
from pandas import *
import pandas as pd
df = pd.DataFrame((np.random.rand(100, 4)*100), columns=list('ABCD'))
df['C'] = df.apply(lambda A,B: A+B)
TypeError: ('() takes exactly 2 arguments (1 given)', u'occurred at index A')
Obviously this doesn't work any recommendation ?
Do you want to add column A
and column B
and store the result in C
? Then you can have it simpler:
df.C = df.A + df.B
As @EdChum points out in the comment, the argument to the function in apply
is a series, by default on axis 0
which are rows (axis 1
means columns):
>>> df.apply(lambda s: s)[:3]
A B C D
0 57.890858 72.344298 16.348960 84.109071
1 85.534617 53.067682 95.212719 36.677814
2 23.202907 3.788458 66.717430 1.466331
Here, we add the first and the second row:
>>> df.apply(lambda s: s[0] + s[1])
A 143.425475
B 125.411981
C 111.561680
D 120.786886
dtype: float64
To work on columns, use axis=1
keyword parameter:
>>> df.apply(lambda s: s[0] + s[1], axis=1)
0 130.235156
1 138.602299
2 26.991364
3 143.229523
...
98 152.640811
99 90.266934
Which yield the same result as referring to the columns by name:
>>> (df.apply(lambda s: s[0] + s[1], axis=1) ==
df.apply(lambda s: s['A'] + s['B'], axis=1))
0 True
1 True
2 True
3 True
...
98 True
99 True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With