Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

snakeyaml and spark results in an inability to construct objects

The following code executes fine in a scala shell given snakeyaml version 1.17

import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.beans.BeanProperty

class EmailAccount {
  @scala.beans.BeanProperty var accountName: String = null

  override def toString: String = {
    return s"acct ($accountName)"
  }
}

val text = """accountName: Ymail Account"""

val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)

However when running in spark (2.0.0 in this case) the resulting error is:

org.yaml.snakeyaml.constructor.ConstructorException: Can't construct a java object for tag:yaml.org,2002:EmailAccount; exception=java.lang.NoSuchMethodException: EmailAccount.<init>()
 in 'string', line 1, column 1:
    accountName: Ymail Account
    ^

  at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
  at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
  at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
  at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
  at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
  at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
  ... 48 elided
Caused by: org.yaml.snakeyaml.error.YAMLException: java.lang.NoSuchMethodException: EmailAccount.<init>()
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:220)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:190)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:346)
  ... 53 more
Caused by: java.lang.NoSuchMethodException: EmailAccount.<init>()
  at java.lang.Class.getConstructor0(Class.java:2810)
  at java.lang.Class.getDeclaredConstructor(Class.java:2053)
  at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:216)
  ... 55 more

I launched the scala shell with

scala -classpath "/home/placey/snakeyaml-1.17.jar"

I launched the spark shell with

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar
like image 566
placeybordeaux Avatar asked Jun 23 '16 22:06

placeybordeaux


1 Answers

Solution

Create a self-contained application and run it using spark-submit instead of using spark-shell.

I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt and Main.scala) in some directory, then run:

sbt package

in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:

/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar

The output should be:

[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]

Explanation

Spark's shell (REPL) transforms all classes you define in it by adding $iw parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.

You can try this yourself:

scala> class Foo() {}
defined class Foo

scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))

scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1

As you can see, Spark transforms the constructor by adding a parameter of type $iw.

Alternative solutions

Define your own Constructor

If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor and make sure that $iw gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor in Scala for security reasons some time ago, so I have some experience with this).

You could also define a custom Constructor hard-coded to instantiate a specific class (EmailAccount in your case) similar to the DiceConstructor shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.

Example:

case class EmailAccount(accountName: String)

class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {

  val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
  this.rootTag = emailAccountTag
  this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)

  private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
    def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
      // TODO: This is fine for quick prototyping in a REPL, but in a real
      //       application you should probably add type checks.
      val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
      val mapping = constructMapping(mnode)
      val name = mapping.get("accountName").asInstanceOf[String]
      new EmailAccount(name)
    }
  }

}

You can save this as a file and load it in the REPL using :load filename.scala.

Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.

Don't use JavaBeans

You can also just parse YAML documents as simple Java maps:

scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301

scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}

scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}

scala> map.get("accountName")
res4: Any = Ymail Account

This way SnakeYAML won't need to use reflection.

However, since you're using Scala, I recommend trying MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount).

like image 94
Paweł Bartkiewicz Avatar answered Oct 27 '22 00:10

Paweł Bartkiewicz