I'm using spark 2.0.1,
df.show()
+--------+------+---+-----+-----+----+
|Survived|Pclass|Sex|SibSp|Parch|Fare|
+--------+------+---+-----+-----+----+
| 0.0| 3.0|1.0| 1.0| 0.0| 7.3|
| 1.0| 1.0|0.0| 1.0| 0.0|71.3|
| 1.0| 3.0|0.0| 0.0| 0.0| 7.9|
| 1.0| 1.0|0.0| 1.0| 0.0|53.1|
| 0.0| 3.0|1.0| 0.0| 0.0| 8.1|
| 0.0| 3.0|1.0| 0.0| 0.0| 8.5|
| 0.0| 1.0|1.0| 0.0| 0.0|51.9|
I have a data frame and I want to add a new column to df using withColumn and value of new column is base on other column value. I used something like this:
>>> dfnew = df.withColumn('AddCol' , when(df.Pclass.contains('3.0'),'three').otherwise('notthree'))
It is giving an error
TypeError: 'Column' object is not callable
can any help how to over come this error.
Its because you are trying to apply the function contains
to the column. The function contains
does not exist in pyspark. You should try like
. Try this:
import pyspark.sql.functions as F
df = df.withColumn("AddCol",F.when(F.col("Pclass").like("3"),"three").otherwise("notthree"))
Or if you just want it to be exactly the number 3
you should do:
import pyspark.sql.functions as F
# If the column Pclass is numeric
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit(3),"three").otherwise("notthree"))
# If the column Pclass is string
df = df.withColumn("AddCol",F.when(F.col("Pclass") == F.lit("3"),"three").otherwise("notthree"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With