I'm building an Apache Spark application in Scala and I'm using SBT to build it. Here is the thing:
sbt test
, I want Spark dependencies to be included in the classpath (same as #1 but from the SBT)To match constraint #2, I'm declaring Spark dependencies as provided
:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
...
)
Then, sbt-assembly's documentation suggests to add the following line to include the dependencies for unit tests (constraint #3):
run in Compile <<= Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run))
That leaves me with constraint #1 not being full-filled, i.e. I cannot run the application in IntelliJ IDEA as Spark dependencies are not being picked up.
With Maven, I was using a specific profile to build the uber JAR. That way, I was declaring Spark dependencies as regular dependencies for the main profile (IDE and unit tests) while declaring them as provided
for the fat JAR packaging. See https://github.com/aseigneurin/kafka-sandbox/blob/master/pom.xml
What is the best way to achieve this with SBT?
Library dependencies can be added in two ways: unmanaged dependencies are jars dropped into the lib directory. managed dependencies are configured in the build definition and downloaded automatically from repositories.
In that case, we can mark that dependency as “provided” in our build. sbt file. The “provided” keyword indicates that the dependency is provided by the runtime, so there's no need to include it in the JAR file. When using sbt-assembly, we may encounter an error caused by the default deduplicate merge strategy.
You can use both managed and unmanaged dependencies in your SBT projects. If you have JAR files (unmanaged dependencies) that you want to use in your project, simply copy them to the lib folder in the root directory of your SBT project, and SBT will find them automatically.
Scoping by the configuration axis Some configurations you'll see in sbt: Compile which defines the main build ( src/main/scala ). Test which defines how to build tests ( src/test/scala ). Runtime which defines the classpath for the run task.
Use the new 'Include dependencies with "Provided" scope' in an IntelliJ configuration.
(Answering my own question with an answer I got from another channel...)
To be able to run the Spark application from IntelliJ IDEA, you simply have to create a main class in the src/test/scala
directory (test
, not main
). IntelliJ will pick up the provided
dependencies.
object Launch { def main(args: Array[String]) { Main.main(args) } }
Thanks Matthieu Blanc for pointing that out.
The main trick here is to create another subproject that will depend on the main subproject and will have all its provided libraries in compile scope. To do this I add the following lines to build.sbt:
lazy val mainRunner = project.in(file("mainRunner")).dependsOn(RootProject(file("."))).settings(
libraryDependencies ++= spark.map(_ % "compile")
)
Now I refresh project in IDEA and slightly change previous run configuration so it will use new mainRunner module's classpath:
Works flawlessly for me.
Source: https://github.com/JetBrains/intellij-scala/wiki/%5BSBT%5D-How-to-use-provided-libraries-in-run-configurations
For running the spark jobs, the general solution of "provided" dependencies work: https://stackoverflow.com/a/21803413/1091436
You can then run the app from either sbt
, or Intellij IDEA, or anything else.
It basically boils down to this:
run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in (Compile, run), runner in (Compile, run)).evaluated,
runMain in Compile := Defaults.runMainTask(fullClasspath in Compile, runner in(Compile, run)).evaluated
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With