I have a df in spark which as the following structure:
amount gender status
1000 male married
1313 female single
1000 male married
Basically i want to create new column where gender is a number
amount gender status gender_num
1000 male married 1
1313 female single 2
1000 male married 1
I tired the following:
val gender = df.gender
val gender_num = gender match {
case male => 1
case female => 2
}
I get the following error:
<console>:125: error: value pa_gender_category is not a member of org.apache.spark.sql.DataFrame
val gender = data.pa_gender_category
I know there is a stringtoindex function, but i would like to do this manually
Use withColumn
val input = // load input DataFrame
val withGender = input.withColumn("gender_num", when($"gender" === "female", 2).otherwise(1))
You can chain more options:
val withGender = input.withColumn("gender_num", when($"gender" === "female", 2).when($"gender" == "other", 3).otherwise(1))
You can also use UDF like in Akash's answer. Be aware, that sometimes UDFs cannot be optimized as much as built-in functions, but they can be more readable
You can use UDF of Spark
import org.apache.spark.sql.functions.udf
def genderToNumber: UserDefinedFunction = {
udf((gender: String) => {
gender match {
case "male" => 1
case "female" => 2
}
} })
You can apply UDF by this
val newDF = df.withColumn("gender_num", genderToNumber(df("gender")))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With