Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make it easier to deploy my Jar to Spark Cluster in standalone mode?

I have a small cluster with 3 machines, and another machine for developing and testing. When developing, I set SparkContext to local. When everything is OK, I want to deploy the Jar file I build to every node. Basically I manually move this jar to cluster and copy to HDFS which shared by the cluster. Then I could change the code to:

//standalone mode
val sc = new SparkContext(
     "spark://mymaster:7077", 
     "Simple App", 
     "/opt/spark-0.9.1-bin-cdh4",   //spark home
     List("hdfs://namenode:8020/runnableJars/SimplyApp.jar") //jar location
) 

to run it in my IDE. My question: Is there any way easier to move this jar to cluster?

like image 427
hakunami Avatar asked Jun 05 '14 06:06

hakunami


People also ask

How do I run Spark submit in standalone mode?

Use spark://HOST:PORT for Standalone cluster, replace the host and port of stand-alone cluster. Use local to run locally with a one worker thread. Use local[k] and specify k with the number of cores you have locally, this runs application with k worker threads.

What is difference between local and standalone mode in Spark?

So the only difference between Standalone and local mode is that in Standalone you are defining "containers" for the worker and spark master to run in your machine (so you can have 2 workers and your tasks can be distributed in the JVM of those two workers?)

What is standalone mode in Spark?

Spark's standalone mode offers a web-based user interface to monitor the cluster. The master and each worker has its own web UI that shows cluster and job statistics. By default, you can access the web UI for the master at port 8080. The port can be changed either in the configuration file or via command-line options.


1 Answers

In Spark, the program creating the SparkContext is called 'the driver'. It's sufficient that the jar file with your job is available to the local file system of the driver in order for it to pick it up and ship it to the master/workers.

In concrete, your config will look like:

//favor using Spark Conf to configure your Spark Context
val conf = new SparkConf()
             .setMaster("spark://mymaster:7077")
             .setAppName("SimpleApp")
             .set("spark.local.ip", "172.17.0.1")
             .setJars(Array("/local/dir/SimplyApp.jar"))

val sc = new SparkContext(conf)

Under the hood, the driver will start a server where the workers will download the jar file(s) from the driver. It's therefore important (and often an issue) that the workers have network access to the driver. This can often be ensured by setting 'spark.local.ip' on the driver in a network that's accessible/routable from the workers.

like image 177
maasg Avatar answered Oct 22 '22 06:10

maasg