Transform Boolean Column to Numerical Column in Apache Spark (Scala) data frame with constraints?

Question

 val inputfile = sqlContext.read
        .format("com.databricks.spark.csv")
        .option("header", "true") 
        .option("inferSchema", "true") 
        .option("delimiter", "	")
        .load("data")
 inputfile: org.apache.spark.sql.DataFrame = [a: string, b: bigint, c: boolean]
 val outputfile = inputfile.groupBy($"a",$"b").max($"c")

Above code fails because c is a boolean variable and aggregates cannot be applied to booleans. Is there a function in Spark that converts true value to 1 and false to 0 for the full column of Spark data frame.

I tried the following (Source: How to change column types in Spark SQL's DataFrame? )

 val inputfile = sqlContext.read
        .format("com.databricks.spark.csv")
        .option("header", "true") 
        .option("inferSchema", "true") 
        .option("delimiter", "	")
        .load("data")
 val tempfile =inputfile.select("a","b","c").withColumn("c",toInt(inputfile("c")))   
 val outputfile = tempfile.groupBy($"a",$"b").max($"c")

Following question: Casting a new derived column in a DataFrame from boolean to integer answers for PySpark but I wanted a function specifically for Scala.

Appreciate any kind of help.

thleo · Accepted Answer

You don't need to use a udf to do this. If you want to convert boolean values to int, you can typecast the column to int

val df2 = df1
  .withColumn("boolAsInt",$"bool".cast("Int")

Transform Boolean Column to Numerical Column in Apache Spark (Scala) data frame with constraints?

Tags:

scala

apache-spark-sql

learner

1 Answers

thleo

Recent Activity

Donate For Us

Transform Boolean Column to Numerical Column in Apache Spark (Scala) data frame with constraints?

Tags:

scala

apache-spark-sql

learner

1 Answers

thleo

Related questions

Recent Activity

Donate For Us