The spark-daria project is uploaded to Spark Packages and I'm accessing spark-daria code in another SBT project with the sbt-spark-package plugin.
I can include spark-daria in the fat JAR file generated by sbt assembly
with the following code in the build.sbt
file.
spDependencies += "mrpowers/spark-daria:0.3.0"
val requiredJars = List("spark-daria-0.3.0.jar")
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter { f =>
!requiredJars.contains(f.data.getName)
}
}
This code feels like a hack. Is there a better way to include spark-daria in the fat JAR file?
N.B. I want to build a semi-fat JAR file here. I want spark-daria to be included in the JAR file, but I don't want all of Spark in the JAR file!
A JAR file created by SBT can be run by the Scala interpreter, but not the Java interpreter. This is because class files in the JAR file created by sbt package have dependencies on Scala class files (Scala libraries), which aren't included in the JAR file SBT generates.
By default, sbt constructs a manifest for the binary package from settings such as organization and mainClass . Additional attributes may be added to the packageOptions setting scoped by the configuration and package task. Main attributes may be added with Package.
The README for version 0.2.6 states the following:
In any case where you really can't specify Spark dependencies using
sparkComponents
(e.g. you have exclusion rules) and configure them asprovided
(e.g. standalone jar for a demo), you may usespIgnoreProvided := true
to properly use theassembly
plugin.
You should then use this flag on your build definition and set your Spark dependencies as provided
as I do with spark-sql:2.2.0
in the following example:
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0" % "provided"
Please note that by setting this your IDE may no longer have the necessary dependencies references to compile and run your code locally, which would mean that you would have to add the necessary JARs to the classpath by hand. I do this often on IntelliJ, what I do is having a Spark distribution on my machine and adding its jars
directory to the IntelliJ project definition (this question may help you with that, should you need it).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With