Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

I've written spark job:

object SimpleApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local")
    val sc = new SparkContext(conf)
    val ctx = new org.apache.spark.sql.SQLContext(sc)
    import ctx.implicits._

    case class Person(age: Long, city: String, id: String, lname: String, name: String, sex: String)
    case class Person2(name: String, age: Long, city: String)

    val persons = ctx.read.json("/tmp/persons.json").as[Person]
    persons.printSchema()
  }
}

In IDE when I run the main function, 2 error occurs:

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

Error:(15, 67) not enough arguments for method as: (implicit evidence$1: org.apache.spark.sql.Encoder[Person])org.apache.spark.sql.Dataset[Person].
Unspecified value parameter evidence$1.
    val persons = ctx.read.json("/tmp/persons.json").as[Person]
                                                                  ^

but in Spark Shell I can run this job without any error. what is the problem?

like image 652
Milad Khajavi Avatar asked Jan 11 '16 06:01

Milad Khajavi


3 Answers

The error message says that the Encoder is not able to take the Person case class.

Error:(15, 67) Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing sqlContext.implicits._  Support for serializing other types will be added in future releases.

Move the declaration of the case class outside the scope of SimpleApp.

like image 120
Developer Avatar answered Nov 14 '22 11:11

Developer


You have the same error if you add sqlContext.implicits._ and spark.implicits._ in SimpleApp (the order doesn't matter).

Removing one or the other will be the solution:

val spark = SparkSession
  .builder()
  .getOrCreate()

val sqlContext = spark.sqlContext
import sqlContext.implicits._ //sqlContext OR spark implicits
//import spark.implicits._ //sqlContext OR spark implicits

case class Person(age: Long, city: String)
val persons = ctx.read.json("/tmp/persons.json").as[Person]

Tested with Spark 2.1.0

The funny thing is if you add the same object implicits twice you will not have problems.

like image 35
Paul Leclercq Avatar answered Nov 14 '22 09:11

Paul Leclercq


@Milad Khajavi

Define Person case classes outside object SimpleApp. Also, add import sqlContext.implicits._ inside main() function.

like image 34
Santhoshm Avatar answered Nov 14 '22 10:11

Santhoshm