Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do mathematical operation with two column in dataframe using pyspark

I have dataframe with three column "x" ,"y" and "z"

x        y         z
bn      12452     221
mb      14521     330
pl      12563     160
lo      22516     142

I need to create a another column which is derived by this formula

(m = z / y+z)

So the new data frameshould look something like this:

x        y         z        m
bn      12452     221      .01743
mb      14521     330      .02222
pl      12563     160      .01257
lo      22516     142      .00626
like image 561
Mukesh Jha Avatar asked Nov 21 '16 19:11

Mukesh Jha


1 Answers

df = sqlContext.createDataFrame([('bn', 12452, 221), ('mb', 14521, 330)], ['x', 'y', 'z'])
df = df.withColumn('m', df['z'] / (df['y'] + df['z']))
df.head(2)
like image 147
None Avatar answered Sep 18 '22 18:09

None