Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set Master address for Spark examples from command line

NOTE: They author is looking for answers to set the Spark Master when running Spark examples that involves no changes to the source code, but rather only options that can be done from the command-line if at all possible.

Let us consider the run() method of the BinaryClassification example:

  def run(params: Params) {     val conf = new SparkConf().setAppName(s"BinaryClassification with $params")     val sc = new SparkContext(conf) 

Notice that the SparkConf did not provide any means to configure the SparkMaster.

When running this program from Intellij with the following arguments:

--algorithm LR --regType L2 --regParam 1.0 data/mllib/sample_binary_classification_data.txt 

the following error occurs:

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration     at org.apache.spark.SparkContext.<init>(SparkContext.scala:166)     at org.apache.spark.examples.mllib.BinaryClassification$.run(BinaryClassification.scala:105) 

I have also tried adding in the Spark Master url anyways (though the code seems NOT to support it ..)

  spark://10.213.39.125:17088   --algorithm LR --regType L2 --regParam 1.0    data/mllib/sample_binary_classification_data.txt 

and

--algorithm LR --regType L2 --regParam 1.0 spark://10.213.39.125:17088 data/mllib/sample_binary_classification_data.txt 

Both do not work with error:

Error: Unknown argument 'data/mllib/sample_binary_classification_data.txt' 

For reference here is the options parsing - which does nothing with SparkMaster:

val parser = new OptionParser[Params]("BinaryClassification") {   head("BinaryClassification: an example app for binary classification.")   opt[Int]("numIterations")     .text("number of iterations")     .action((x, c) => c.copy(numIterations = x))   opt[Double]("stepSize")     .text(s"initial step size, default: ${defaultParams.stepSize}")     .action((x, c) => c.copy(stepSize = x))   opt[String]("algorithm")     .text(s"algorithm (${Algorithm.values.mkString(",")}), " +     s"default: ${defaultParams.algorithm}")     .action((x, c) => c.copy(algorithm = Algorithm.withName(x)))   opt[String]("regType")     .text(s"regularization type (${RegType.values.mkString(",")}), " +     s"default: ${defaultParams.regType}")     .action((x, c) => c.copy(regType = RegType.withName(x)))   opt[Double]("regParam")     .text(s"regularization parameter, default: ${defaultParams.regParam}")   arg[String]("<input>")     .required()     .text("input paths to labeled examples in LIBSVM format")     .action((x, c) => c.copy(input = x)) 

So .. yes .. I could go ahead and modify the source code. But I suspect instead I am missing an available tuning knob to make this work that does not involve modifying the source code.

like image 425
WestCoastProjects Avatar asked Jun 30 '14 00:06

WestCoastProjects


Video Answer


2 Answers

You can set the Spark master from the command-line by adding the JVM parameter:

-Dspark.master=spark://myhost:7077 
like image 186
WestCoastProjects Avatar answered Sep 26 '22 12:09

WestCoastProjects


If you want to get this done from code you can use .setMaster(...) when creating the SparkConf:

val conf = new SparkConf().setAppName("Simple Application")                           .setMaster("spark://myhost:7077") 


Long overdue EDIT (as per the comments)

For the session in Spark 2.x +:

val spark = SparkSession.builder()                         .appName("app_name")                         .getOrCreate() 

Command line (2.x) assuming local standalone cluster.

spark-shell --master spark://localhost:7077  
like image 30
Lyuben Todorov Avatar answered Sep 24 '22 12:09

Lyuben Todorov