I want to create a Scala trait that should be implemented with a case class T. The trait is simply to load data and transform it into a Spark Dataset of type T. I got the error that no encoder can be stored, which I think is because Scala does not know that T should be a case class. How can I tell the compiler that? I've seen somewhere that I should mention Product, but there is no such class defined.. Feel free to suggest other ways to do this!
I have the following code but it is not compiling with the error: 42: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._ [INFO] .as[T]
I'm using Spark 1.6.1
Code:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Dataset, SQLContext}
/**
* A trait that moves data on Hadoop with Spark based on the location and the granularity of the data.
*/
trait Agent[T] {
/**
* Load a Dataframe from the location and convert into a Dataset
* @return Dataset[T]
*/
protected def load(): Dataset[T] = {
// Read in the data
SparkContextKeeper.sqlContext.read
.format("com.databricks.spark.csv")
.load("/myfolder/" + location + "/2016/10/01/")
.as[T]
}
}
Your code is missing 3 things:
Product
(the superclass of all Scala case classes and Tuples)TypeTag
and ClassTag
of the actual case class. This is used implicitly by Spark to overcome type erasure sqlContext.implicits._
Unfortunately, you can't add type parameters with context bounds in a trait, so the simplest workaround would be to use an abstract class
instead:
import scala.reflect.runtime.universe.TypeTag
import scala.reflect.ClassTag
abstract class Agent[T <: Product : ClassTag : TypeTag] {
protected def load(): Dataset[T] = {
val sqlContext: SQLContext = SparkContextKeeper.sqlContext
import sqlContext.implicits._
sqlContext.read.// same...
}
}
Obviously, this isn't equivalent to using a trait, and might suggest that this design isn't the best fit for the job. Another alternative is placing load
in an object and moving the type parameter to the method:
object Agent {
protected def load[T <: Product : ClassTag : TypeTag](): Dataset[T] = {
// same...
}
}
Which one is preferable is mostly up to where and how you're going to call load
and what you're planning to do with the result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With