I have a function like this in Scala code (Scala 2.13) for use with Spark
def getDataset[T <: Product: TypeTag](name:String): Dataset[T] = {
import spark.implicits._
val ds = spark.read.parquet(BASE_PATH + "/" + name).as[T]
ds.createOrReplaceTempView(name)
ds
}
Now I want to turn a Seq of case classes, and for each class, call this function:
case class CLASS1(...)
case class CLASS2(...)
case class CLASS3(...)
Seq(CLASS1, CLASS2, CLASS3, ....).foreach {
c => getDataset[c??](name=c???)
}
I'm having a hard time figuring out the exact syntax; the symbol for the name of the case class, represented by the variable c inside the foreach, seems to represent the type of the apply method (() => Product). What I really want is the type of the case class to use as the type parameter, and the name of the case class.
It feels like I should be able to do this - what am I missing here?
Update It looks like it's possible to get the name of the type used in a type parameter at runtime, via TypeTag.
The solution I am converging on is something like this:
def getDataset[T <: Product: TypeTag]: Dataset[T] = {
import spark.implicits._
val name = typeTag[T].tpe.typeSymbol.name.toString
val ds = spark.read.parquet(BASE_PATH + "/" + name).as[T]
ds.createOrReplaceTempView(name)
ds
}
Then something like Seq(getDataset[CLASS1], getDataset[CLASS2], ...)
Not what I hoped for, but at least I can cut out the copy-paste of the class name and string.
You could define your own companion objects for the case classes and include a method in each which calls getDataset. For example, this should work (passed by my mental compiler):
abstract class DatasetProvider[T <: Product : TypeTag] {
val name: String
def dataset: Dataset[T] =
getDataset[T](name)
}
case class Class1(...)
object Class1 extends DatasetProvider[Class1] {
override val name: String = "class1"
}
// and so forth for Class2, Class3
Seq(Class1, Class2, Class3).foreach { c =>
val ds = c.dataset
???
}
Note that if defining your own companion object, you will have to explicitly mark it as a function if you want to use it as one: this may or may not be desirable.
The problem is that you want to substitute T (known at compile time) at type level and name (known at runtime) at value level.
Normally T and name do not exist at the same time.
One option is to replace Seq(Class1, Class2, Class3) on value level with Class1 :: Class2 :: Class3 :: HNil on type level and use Shapeless
import shapeless.{::, HNil, Poly0, Poly1, Typeable}
import shapeless.ops.hlist.FillWith
import scala.reflect.runtime.universe.{TypeTag, typeOf}
object datasetPoly extends Poly1 {
implicit def cse[T <: Product : TypeTag /*: Typeable*/]: Case.Aux[T, Dataset[T]] =
at(_ => getDataset[T](/*Typeable[T].describe*/typeOf[T].toString))
}
object nullPoly extends Poly0 {
implicit def cse[T >: Null]: Case0[T] = at(null)
}
FillWith[nullPoly.type, Class1 :: Class2 :: Class3 :: HNil].apply().map(datasetPoly)
Alternatively you can use macros or runtime reflection.
In Seq(Class1, Class2, Class3) Class1, Class2, Class3 are the companion objects of case classes.
For example with reflective toolbox
import scala.reflect.runtime.universe.Quasiquote
import scala.reflect.runtime.{currentMirror => cm}
import scala.tools.reflect.ToolBox
val tb = cm.mkToolBox()
Seq(Class1, Class2, Class3).foreach(c => {
val classSymbol = cm.reflect(c).symbol.companion
tb.eval(q"App.getDataset[$classSymbol](${classSymbol.name.toString})")
})
You should add to build.sbt
libraryDependencies += scalaOrganization.value % "scala-reflect" % scalaVersion.value
libraryDependencies += scalaOrganization.value % "scala-compiler" % scalaVersion.value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With