Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark using recursive case class

I have a Datastructure that is recursive. Spark gives this error:

Exception in thread "main" java.lang.UnsupportedOperationException: cannot have circular references in class, but got the circular reference of class BulletPoint

As an example I did this code:

case class BulletPoint(item: String, children: List[BulletPoint])

object TestApp extends App {
  val sparkSession = SparkSession
    .builder()
    .appName("spark app")
    .master(s"local")
    .getOrCreate()

  import sparkSession.implicits._

  sparkSession.createDataset(List(BulletPoint("1", Nil), BulletPoint("2", Nil)))
}

Does someone have an idea how one can go around this issue?

like image 913
oleber Avatar asked Oct 16 '22 08:10

oleber


1 Answers

The exception is fairly explicit - such case is not supported by default. You have to remember that Datasets are encoded into relational schema, so all required fields have to be declared up-front and bounded. There is no place here for recursive structure.

There is small window here - binary Encoders:

import org.apache.spark.sql.{Encoder, Encoders}

sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))(Encoders.kryo[BulletPoint])

or equivalent:

implicit val bulletPointEncoder = Encoders.kryo[BulletPoint]

sparkSession.createDataset(List(
  BulletPoint("1", Nil), BulletPoint("2", Nil)
))

but it is really not something you'd like to have in your code, unless strictly necessary.

like image 159
10465355 Avatar answered Oct 21 '22 01:10

10465355