Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to submit spark job from within java program to standalone spark cluster without using spark-submit?

I am using spark to perform some computations but want it to be submitted from java application.It works proper using when submitted using spark-submit script.Has anyone tried to do this?

Thanks.

like image 802
Sachin Janani Avatar asked Apr 02 '15 17:04

Sachin Janani


People also ask

How do I run spark submit in standalone mode?

Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.

Can we run the spark submit in local mode in cluster?

No, the spark-submit parameters num-executors , executor-cores , executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.

Can you explain what happens internally when we submit a spark job using spark submit?

The entire resource allocation and the tracking of the jobs and tasks are performed by the cluster manager. As soon as you do a Spark submit, your user program and other configuration mentioned are copied onto all the available nodes in the cluster. So that the program becomes the local read on all the worker nodes.


1 Answers

Don't forget to add the fat JAR containing your code to the context.

val conf = new SparkConf()
   .setMaster(...)
   .setAppName(...)
   .setJars("/path/to/code.jar")
val sc = new SparkContext(conf)
like image 116
Marius Soutier Avatar answered Sep 20 '22 18:09

Marius Soutier