Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Spark Parallelize? (Could not find creator property with name 'id')

What causes this Serialization error in Apache Spark 1.4.0 when calling:

sc.parallelize(strList, 4)

This exception is thrown:

Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)

Thrown from addBeanProps in Jackson: com.fasterxml.jackson.databind.deser.BeanDeserializerFactory#addBeanProps

The RDD is a Seq[String], and the #partitions doesn't seem to matter (tried 1, 2, 4).

There is no serialization stack trace, as normal the worker closure cannot be serialized.

What is another way to track this down?

like image 928
Brent Faust Avatar asked Jun 25 '15 00:06

Brent Faust

Video Answer

2 Answers

@Interfector is correct. I ran into this issue also, here's a snippet from my sbt file and the 'dependencyOverrides' section which fixed it.

libraryDependencies ++= Seq(
  "com.amazonaws" % "amazon-kinesis-client" % "1.4.0",
  "org.apache.spark" %% "spark-core" % "1.4.0",
  "org.apache.spark" %% "spark-streaming" % "1.4.0",
  "org.apache.spark" %% "spark-streaming-kinesis-asl" % "1.4.0",
  "com.amazonaws" % "aws-java-sdk" % "1.10.2"

dependencyOverrides ++= Set(
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.4.4"
like image 78
charmquark Avatar answered Sep 30 '22 17:09


I suspect that this is caused by the classpath providing you with a different version of jackson than the one Spark is expecting (that is 2.4.4 if I'm not mistaking). You will need to adjust your classpath so that the correct jackson is referenced first for Spark.

like image 27
Interfector Avatar answered Sep 30 '22 15:09
