I need a Fat Jar with Spark because I'm creating a custom node for Knime. Basically it's a self-contained jar executed inside Knime and I assume a Fat Jar is the only way to spawn a local Spark Job. Eventually we will go on submitting a job to a remote cluster but for now I need it to spawn this way.
That said, I made a Fat Jar using this: https://github.com/sbt/sbt-assembly
I made an empty sbt project, included Spark-core in the dependencies and assembled the Jar. I added it to the manifest of my custom Knime node and tried to spawn a simple job (pararellize a collection, collect it and print it). It starts but I get this error:
No configuration setting found for key 'akka.version'
I have no idea how to solve it.
Edit: this is my build.sbt
name := "SparkFatJar"
version := "1.0"
scalaVersion := "2.11.6"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.3.0"
)
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.3.8"
assemblyJarName in assembly := "SparkFatJar.jar"
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case x => MergeStrategy.first
}
I've found this mergestrategy for Spark somewhere on the internet but I can't find the source right now.
I think the issue is with how you've setup assemblyMergeStrategy
. Try this:
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs @ _*) => MergeStrategy.discard
case "application.conf" => MergeStrategy.concat
case "reference.conf" => MergeStrategy.concat
case x =>
val baseStrategy = (assemblyMergeStrategy in assembly).value
baseStrategy(x)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With