I know SparkLauncher is used to launch spark application programmatically instead of using spark-submit
script, but I am feeling a bit confused when to use SparkLauncher or what's the benefit.
Following code is using SparkLauncher to launch a spark application whose main class is "org.apache.spark.launcher.WordCountApp
:
The code is:
object WordCountSparkLauncher {
def main(args: Array[String]) {
val proc = new SparkLauncher()
.setAppName("WordCountSparkLauncherApp")
.setMaster("local")
.setSparkHome("D:/spark-2.2.0-bin-hadoop2.7")
.setAppResource("file:///d:/spark-2.2.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.0.jar")
.setVerbose(true)
.setMainClass("org.apache.spark.launcher.WordCountApp")
.launch()
new Thread(new IORunnable(proc.getInputStream, "proc-input-stream")).start()
new Thread(new IORunnable(proc.getErrorStream, "proc-error-input-stream")).start()
proc.waitFor()
}
}
It is working fine,but there is another choice:
Create a runnable fat jar using maven shade plugin to pack all the spark related dependencies into one jar, and in this way, I could still run the spark application with java -jar thefatjar
.
What are the benefits of SparkLauncher
vs a fat runnable jar?
what's benefit of SparkLauncher, Is there some benefit of SparkLauncher over fat runnable jar?
Think of the different ways you launch a Spark application and what integration options you have.
With a fat-jar you have to have Java installed and launching the Spark application requires executing java -jar [your-fat-jar-here]
. It's hard to automate it if you want to, say, launch the application from a web application.
With SparkLauncher
you're given the option of launching a Spark application from another application, e.g. the web application above. It is just much easier.
While both give you integration points in some way, SparkLauncher
is just simpler to work with from another JVM-based application. You don't have to revert to using the command line (that has its own "niceties").
If I want to run spark application within another program, I will simply create SparkContext within the web application, spark is used as normal framework in the web.
That would tightly couple the web application and the Spark application for one and would keep the compute resources (like threads) busy while the Spark application executes. HTTP requests are short-lived while Spark jobs are long-lived.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With