I have a Datastructure that is recursive. Spark gives this error:
Exception in thread "main" java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class BulletPoint
As an example I did this code:
case class BulletPoint(item: String, children: List[BulletPoint])
object TestApp extends App {
val sparkSession = SparkSession
.builder()
.appName("spark app")
.master(s"local")
.getOrCreate()
import sparkSession.implicits._
sparkSession.createDataset(List(BulletPoint("1", Nil), BulletPoint("2", Nil)))
}
Does someone have an idea how one can go around this issue?
The exception is fairly explicit - such case is not supported by default. You have to remember that Datasets
are encoded into relational schema, so all required fields have to be declared up-front and bounded. There is no place here for recursive structure.
There is small window here - binary Encoders
:
import org.apache.spark.sql.{Encoder, Encoders}
sparkSession.createDataset(List(
BulletPoint("1", Nil), BulletPoint("2", Nil)
))(Encoders.kryo[BulletPoint])
or equivalent:
implicit val bulletPointEncoder = Encoders.kryo[BulletPoint]
sparkSession.createDataset(List(
BulletPoint("1", Nil), BulletPoint("2", Nil)
))
but it is really not something you'd like to have in your code, unless strictly necessary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With