I am using SparkLauncher
in Spark v1.6.0. My problem is that when I use this class to launch my Spark jobs, it returns immediately and no job is submitted. My code is as follows.
new SparkLauncher()
.setAppName("test word count")
.setAppResource("file://c:/temp/my.jar")
.setMainClass("my.spark.app.Main")
.setMaster("spark://master:7077")
.startApplication(new SparkAppHandler.Listener() {
@Override public void stateChanged(SparkAppHandle h) { }
@Override public void infoChanged(SparkAppHandle h) { }
});
When I debug into the code, I notice, to my surprise, that all this clazz really does is calls a script spark-submit.cmd
using ProcessBuilder
.
[C:/tmp/spark-1.6.0-bin-hadoop2.6/bin/spark-submit.cmd, --master, spark://master:7077, --name, "test word count", --class, my.spark.appMain, C:/temp/my.jar]
However, if I run this command (the one that is run by ProcessBuilder
) directly on the console, a Spark job is submitted. Any ideas on what's going on?
There's another method SparkLauncher.launch()
that is available, but the javadocs say to avoid this method.
Any idea what's going on?
If it works in the console but not from your program, you may need to tell the SparkLauncher where your Spark home is by:
.setSparkHome("C:/tmp/spark-1.6.0-bin-hadoop2.6")
But there could be other things going wrong. You may want to capture additional debugging information by using:
.addSparkArg("--verbose")
and
Map<String, String> env = Maps.newHashMap();
env.put("SPARK_PRINT_LAUNCH_COMMAND", "1");
Pass the env object to the SparkLauncher constructor:
new SparkLauncher(env)
How do you place the new SparkLauncher()
statement in the program?
If the main program/unit test immediately finishes after invoking .startApplication()
, then the child-process created by it is terminated as well.
You can check the state of the job with the handle created
SparkAppHandle handle = new SparkLauncher()
.setAppName("test word count")
.setAppResource("file://c:/temp/my.jar")
.setMainClass("my.spark.app.Main")
.setMaster("spark://master:7077")
.startApplication();
handle.getState(); // immediately returns UNKNOWN
Thread.sleep(1000); // wait a little bit...
handle.getState(); // the state may have changed to CONNECTED or others
I think that it is because the application takes a certain time to connect to the master, if the program ends before the connection is established, then no job is submitted.
You need to wait for the launcher to get connected to driver nd get you app id and status. For that you can do while loop or something similar. eg.
while(!handle.getState().isFinal()) {
logger.info("Current state: "+ handle.getState())
logger.info("App Id "+ handle.getAppId());
Thread.sleep(1000L);
// other stuffs you want to do
//
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With