Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark SQL convert dataset to dataframe

How do I convert a dataset obj to a dataframe? In my example, I am converting a JSON file to dataframe and converting to DataSet. In dataset, I have added some additional attribute(newColumn) and convert it back to a dataframe. Here is my example code:

val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").option("multiline", "true").json(filePath)

.....

 import sparkSession.implicits._
    val res = empData.as[Emp]

    //for (i <- res.take(4)) println(i.name + " ->" + i.newColumn)

    val s = res.toDF();

    s.printSchema()

  }
  case class Emp(name: String, gender: String, company: String, address: String) {
    val newColumn = if (gender == "male") "Not-allowed" else "Allowed"
  }

But I am expected the new column name newColumn added in s.printschema(). output result. But it is not happening? Why? Any reason? How can I achieve this?

like image 504
Learn Hadoop Avatar asked Oct 11 '18 14:10

Learn Hadoop


Video Answer


1 Answers

The schema of the output with Product Encoder is solely determined based on it's constructor signature. Therefore anything that happens in the body is simply discarded.

You can

empData.map(x => (x, x.newColumn)).toDF("value", "newColumn")
like image 83
user10490512 Avatar answered Oct 10 '22 14:10

user10490512