Lets assume I create a parquet file as follows :
case class A (i:Int,j:Double,s:String)
var l1 = List(A(1,2.0,"s1"),A(2,3.0,"S2"))
val ds = spark.createDataset(l1)
ds.write.parquet("/tmp/test.parquet")
Is it possible to read it into a Dataset of a type with a different schema, where the only difference is few additional fields?
Eg:
case class B (i:Int,j:Double,s:String,d:Double=1.0) // d is extra and has a default value
Is there a way that i can make this work? :
val ds2 = spark.read.parquet("/tmp/test.parquet").as[B]
In Spark, if the schema of the Dataset does not match the desired U
type, you can use select
along with alias
or as to rearrange or rename as required. It means for the following code to work:
val ds2 = spark.read.parquet("/tmp/test.parquet").as[B]
Following modifications needs to be done:
val ds2 = spark.read.parquet("/tmp/test.parquet").withColumn("d", lit(1D)).as[B]
Or, if creating additional column is not possible, then following can be done:
val ds2 = spark.read.parquet("/tmp/test.parquet").map{
case row => B(row.getInt(0), row.getDouble(1), row.getString(2))
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With