Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deploy spark driver application without spark submit

Let's suppose we have a spark driver program written like this:

public class SimpleApp {
  public static void main(String[] args) {
    String logFile = "YOUR_SPARK_HOME/README.md"; // Should be some file on your system
    SparkConf conf = new SparkConf().setAppName("Simple Application");
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaRDD<String> logData = sc.textFile(logFile).cache();

    long numAs = logData.filter(new Function<String, Boolean>() {
      public Boolean call(String s) { return s.contains("a"); }
    }).count();

    long numBs = logData.filter(new Function<String, Boolean>() {
      public Boolean call(String s) { return s.contains("b"); }
    }).count();

    System.out.println("Lines with a: " + numAs + ", lines with b: " + numBs);
  }
}

and I want to run in a yarn cluster, can I avoid using spark-submit and (supposing of course I have access to one cluster node ), by just specifying in the context I want to run on yarn? In other words, is it possible to launch the spark client as a regular java app leveraging yarn?

like image 752
Felice Pollano Avatar asked Sep 09 '16 05:09

Felice Pollano


People also ask

How do I run spark submit in standalone mode?

Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.

Which command can be used to deploy and run the spark application?

Spark application, using spark-submit, is a shell command used to deploy the Spark application on a cluster.

What happens when spark application is submitted?

What happens when a Spark Job is submitted? When a client submits a spark user application code, the driver implicitly converts the code containing transformations and actions into a logical directed acyclic graph (DAG).

What is deploy mode in spark submit?

Deploy mode specifies the location of where driver executes in the deployment environment. Deploy mode can be one of the following options: client (default) - the driver runs on the machine that the Spark application was launched. cluster - the driver runs on a random node in a cluster.


1 Answers

Here is another official way to do it.

Spark Launcher - Library for launching Spark applications.

This library allows applications to launch Spark programmatically. There's only one entry point to the library - the SparkLauncher class.

To launch a Spark application, just instantiate a SparkLauncher and configure the application to run. For example:

 import org.apache.spark.launcher.SparkLauncher;

   public class MyLauncher {
     public static void main(String[] args) throws Exception {
       Process spark = new SparkLauncher()
         .setAppResource("/my/app.jar")
         .setMainClass("my.spark.app.Main")
         .setMaster("local")
         .setConf(SparkLauncher.DRIVER_MEMORY, "2g")
         .launch();
       spark.waitFor();
     }
   }

You can set all the YARN specific config using setConf method and set the master to yarn-client or yarn-cluster

References: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/launcher/package-summary.html

like image 74
Rakesh Rakshit Avatar answered Oct 03 '22 08:10

Rakesh Rakshit