Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

contains pyspark SQL: TypeError: 'Column' object is not callable

I'm using spark 2.0.1,

 df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
|     0.0|   3.0|1.0|  1.0|  0.0| 7.3|
|     1.0|   1.0|0.0|  1.0|  0.0|71.3|
|     1.0|   3.0|0.0|  0.0|  0.0| 7.9|
|     1.0|   1.0|0.0|  1.0|  0.0|53.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.1|
|     0.0|   3.0|1.0|  0.0|  0.0| 8.5|
|     0.0|   1.0|1.0|  0.0|  0.0|51.9|

I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value. I used something like this:

>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))

It is giving an error

TypeError: 'Column' object is not callable

can any help how to over come this error.

like image 906
Jeevan Avatar asked Dec 14 '18 22:12

Jeevan


1 Answers

Its because you are trying to apply the function contains to the column. The function contains does not exist in pyspark. You should try like. Try this:

import pyspark.sql.functions as F

df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))

Or if you just want it to be exactly the number 3 you should do:

import pyspark.sql.functions as F

# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))

# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))
like image 160
Manrique Avatar answered Sep 30 '22 10:09

Manrique