I am trying to understand the difference between Dataset and data frame and found the following helpful link , but i am not able to understand what is meant by type safe?
Difference between DataFrame (in Spark 2.0 i.e DataSet[Row] ) and RDD in Spark
RDDs and Datasets are type safe means that compiler know the Columns and it's data type of the Column whether it is Long, String, etc....
But, In Dataframe, every time when you call an action, collect()
for instance,then it will return the result as an Array of Rows not as Long, String data type. In dataframe, Columns have their own type such as integer, String but they are not exposed to you. To you, its any type.
To convert the Row of data into it's suitable type you have to use .asInstanceOf
method.
eg: In Scala:
scala > :type df.collect()
Array[org.apache.spark.sql.Row]
df.collect().map{ row =>
val str = row(0).asInstanceOf[String]
val num = row(1).asInstanceOf[Long]
}
People who loves example, here it is:
case class Employ(name: String, age: Int, id: Int, department: String)
val empData = Seq(Employ("A", 24, 132, "HR"), Employ("B", 26, 131, "Engineering"), Employ("C", 25, 135, "Data Science"))
create an dataframe and dataset data
val empRDD = spark.sparkContext.makeRDD(empData)
val empDataFrame = empRDD.toDf()
val empDataset = empRDD.toDS()
Lets perform an operation :
Dataset
val empDatasetResult = empDataset.filter(employ => employ.age > 24)
Dataframe
val empDatasetResult = empDataframe.filter(employ => employ.age > 24)
//thows error "value age is not a member of org.apache.spark.sql.Row object."
In the case of Dataframe when we perform lambda it returns a Row object and not an Integer object so you cant directly do employ.age > 24
, but you can do below:
val empDataFrameResult = empDataFrame.filter(employ => employ.getAs[Int]("age") > 24)
Why is the dataset so special then?
Who don't like boilerplate code? Let's create it using Datasets ..
Thanks to :https://blog.knoldus.com/spark-type-safety-in-dataset-vs-dataframe/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With