Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the benefits of SparkLauncher vs java -jar fat-jar?

Tags:

apache-spark

I know SparkLauncher is used to launch spark application programmatically instead of using spark-submit script, but I am feeling a bit confused when to use SparkLauncher or what's the benefit.

Following code is using SparkLauncher to launch a spark application whose main class is "org.apache.spark.launcher.WordCountApp:

The code is:

object WordCountSparkLauncher {
  def main(args: Array[String]) {
    val proc = new SparkLauncher()
      .setAppName("WordCountSparkLauncherApp")
      .setMaster("local")
      .setSparkHome("D:/spark-2.2.0-bin-hadoop2.7")
      .setAppResource("file:///d:/spark-2.2.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.2.0.jar")
      .setVerbose(true)
      .setMainClass("org.apache.spark.launcher.WordCountApp")
      .launch()

    new Thread(new IORunnable(proc.getInputStream, "proc-input-stream")).start()

    new Thread(new IORunnable(proc.getErrorStream, "proc-error-input-stream")).start()

    proc.waitFor()

  }
}

It is working fine,but there is another choice:

Create a runnable fat jar using maven shade plugin to pack all the spark related dependencies into one jar, and in this way, I could still run the spark application with java -jar thefatjar.

What are the benefits of SparkLauncher vs a fat runnable jar?

like image 417
Tom Avatar asked Mar 03 '18 08:03

Tom


1 Answers

what's benefit of SparkLauncher, Is there some benefit of SparkLauncher over fat runnable jar?

Think of the different ways you launch a Spark application and what integration options you have.

With a fat-jar you have to have Java installed and launching the Spark application requires executing java -jar [your-fat-jar-here]. It's hard to automate it if you want to, say, launch the application from a web application.

With SparkLauncher you're given the option of launching a Spark application from another application, e.g. the web application above. It is just much easier.

While both give you integration points in some way, SparkLauncher is just simpler to work with from another JVM-based application. You don't have to revert to using the command line (that has its own "niceties").

If I want to run spark application within another program, I will simply create SparkContext within the web application, spark is used as normal framework in the web.

That would tightly couple the web application and the Spark application for one and would keep the compute resources (like threads) busy while the Spark application executes. HTTP requests are short-lived while Spark jobs are long-lived.

like image 139
Jacek Laskowski Avatar answered Nov 14 '22 01:11

Jacek Laskowski