Spark using recursive case class

Question

I have a Datastructure that is recursive. Spark gives this error:

Exception in thread "main" java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class BulletPoint

As an example I did this code:

case class BulletPoint(item: String, children: List[BulletPoint])

object TestApp extends App {
  val sparkSession = SparkSession
    .builder()
    .appName("spark app")
    .master(s"local")
    .getOrCreate()

  import sparkSession.implicits._

  sparkSession.createDataset(List(BulletPoint("1", Nil), BulletPoint("2", Nil)))
}

Does someone have an idea how one can go around this issue?

10465355 · Accepted Answer

The exception is fairly explicit - such case is not supported by default. You have to remember that Datasets are encoded into relational schema, so all required fields have to be declared up-front and bounded. There is no place here for recursive structure.

There is small window here - binary Encoders:

import org.apache.spark.sql.{Encoder, Encoders}

sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))(Encoders.kryo[BulletPoint])

or equivalent:

implicit val bulletPointEncoder = Encoders.kryo[BulletPoint]

sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))

but it is really not something you'd like to have in your code, unless strictly necessary.

Spark using recursive case class

Tags:

scala

apache-spark

apache-spark-sql

apache-spark-dataset

oleber

1 Answers

10465355

Recent Activity

Donate For Us

Spark using recursive case class

Tags:

scala

apache-spark

apache-spark-sql

apache-spark-dataset

oleber

1 Answers

10465355

Related questions

Recent Activity

Donate For Us