Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does SparkLauncher return immediately and spawn no job?

I am using SparkLauncher in Spark v1.6.0. My problem is that when I use this class to launch my Spark jobs, it returns immediately and no job is submitted. My code is as follows.

new SparkLauncher()
 .setAppName("test word count")
 .setAppResource("file://c:/temp/my.jar")
 .setMainClass("my.spark.app.Main")
 .setMaster("spark://master:7077")
 .startApplication(new SparkAppHandler.Listener() {
   @Override public void stateChanged(SparkAppHandle h) { }
   @Override public void infoChanged(SparkAppHandle h) { } 
  });

When I debug into the code, I notice, to my surprise, that all this clazz really does is calls a script spark-submit.cmd using ProcessBuilder.

[C:/tmp/spark-1.6.0-bin-hadoop2.6/bin/spark-submit.cmd, --master, spark://master:7077, --name, "test word count", --class, my.spark.appMain, C:/temp/my.jar]

However, if I run this command (the one that is run by ProcessBuilder) directly on the console, a Spark job is submitted. Any ideas on what's going on?

There's another method SparkLauncher.launch() that is available, but the javadocs say to avoid this method.

Any idea what's going on?

like image 227
Jane Wayne Avatar asked Jan 14 '16 22:01

Jane Wayne


3 Answers

If it works in the console but not from your program, you may need to tell the SparkLauncher where your Spark home is by:

.setSparkHome("C:/tmp/spark-1.6.0-bin-hadoop2.6")

But there could be other things going wrong. You may want to capture additional debugging information by using:

.addSparkArg("--verbose")

and

Map<String, String> env = Maps.newHashMap();
env.put("SPARK_PRINT_LAUNCH_COMMAND", "1");

Pass the env object to the SparkLauncher constructor:

new SparkLauncher(env)
like image 191
Dipaan Avatar answered Nov 10 '22 18:11

Dipaan


How do you place the new SparkLauncher() statement in the program?

If the main program/unit test immediately finishes after invoking .startApplication(), then the child-process created by it is terminated as well.

You can check the state of the job with the handle created

SparkAppHandle handle = new SparkLauncher()
    .setAppName("test word count")
    .setAppResource("file://c:/temp/my.jar")
    .setMainClass("my.spark.app.Main")
    .setMaster("spark://master:7077")
    .startApplication();

handle.getState();  // immediately returns UNKNOWN

Thread.sleep(1000); // wait a little bit...

handle.getState();  // the state may have changed to CONNECTED or others

I think that it is because the application takes a certain time to connect to the master, if the program ends before the connection is established, then no job is submitted.

like image 28
ICS Avatar answered Nov 10 '22 19:11

ICS


You need to wait for the launcher to get connected to driver nd get you app id and status. For that you can do while loop or something similar. eg.

   while(!handle.getState().isFinal()) { 
   logger.info("Current state: "+ handle.getState()) 
   logger.info("App Id "+ handle.getAppId());
   Thread.sleep(1000L);
   // other stuffs you want to do 
   //
   }
like image 23
Filmon Gebreyesus Avatar answered Nov 10 '22 18:11

Filmon Gebreyesus