Spark SQL convert dataset to dataframe

Question

How do I convert a dataset obj to a dataframe? In my example, I am converting a JSON file to dataframe and converting to DataSet. In dataset, I have added some additional attribute(newColumn) and convert it back to a dataframe. Here is my example code:

val empData = sparkSession.read.option("header", "true").option("inferSchema", "true").option("multiline", "true").json(filePath)

.....

 import sparkSession.implicits._
    val res = empData.as[Emp]

    //for (i <- res.take(4)) println(i.name + " ->" + i.newColumn)

    val s = res.toDF();

    s.printSchema()

  }
  case class Emp(name: String, gender: String, company: String, address: String) {
    val newColumn = if (gender == "male") "Not-allowed" else "Allowed"
  }

But I am expected the new column name newColumn added in s.printschema(). output result. But it is not happening? Why? Any reason? How can I achieve this?

user10490512 · Accepted Answer

The schema of the output with Product Encoder is solely determined based on it's constructor signature. Therefore anything that happens in the body is simply discarded.

You can

empData.map(x => (x, x.newColumn)).toDF("value", "newColumn")

Spark SQL convert dataset to dataframe

Tags:

scala

apache-spark

apache-spark-sql

Learn Hadoop

Video Answer

1 Answers

user10490512

Recent Activity

Donate For Us

Spark SQL convert dataset to dataframe

Tags:

scala

apache-spark

apache-spark-sql

Learn Hadoop

Video Answer

1 Answers

user10490512

Related questions

Recent Activity

Donate For Us