Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark - Error "A master URL must be set in your configuration" when submitting an app

I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.

The error msg are as follows:

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError     at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)     at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)     at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)     at org.apache.spark.scheduler.Task.run(Task.scala:89)     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)     at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration     at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)     at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37)     at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala)     ... 14 more  16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$     at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)     at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)     at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)     at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)     at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)     at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)     at org.apache.spark.scheduler.Task.run(Task.scala:89)     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)     at java.lang.Thread.run(Thread.java:745) 

In the above code, GroupEvolutionES is the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit.

Anyone who knows how to fix this problem?

Spark version: 1.6.1

like image 665
Shuai Zhang Avatar asked Jun 24 '16 07:06

Shuai Zhang


2 Answers

The TLDR:

.config("spark.master", "local") 

a list of the options for spark.master in spark 2.2.1

I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:

SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.master", "local") .getOrCreate(); 

An update to my answer:

To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).

If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.

Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:

SparkSession .builder() .appName("Java Spark SQL basic example") .getOrCreate(); 

This will use the configurations specified on the command line/in config files.

Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.

also, see this answer. which links to a list of the options for spark.master and what each one actually does.

a list of the options for spark.master in spark 2.2.1

like image 198
Jack Davidson Avatar answered Sep 17 '22 21:09

Jack Davidson


Worked for me after replacing

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME"); 

with

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g"); 

Found this solution on some other thread on stackoverflow.

like image 38
Sachin Avatar answered Sep 18 '22 21:09

Sachin