I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.
The error msg are as follows:
16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579) at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration at org.apache.spark.SparkContext.<init>(SparkContext.scala:401) at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37) at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala) ... 14 more 16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$ at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579) at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
In the above code, GroupEvolutionES
is the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit
.
Anyone who knows how to fix this problem?
Spark version: 1.6.1
The TLDR:
.config("spark.master", "local")
a list of the options for spark.master in spark 2.2.1
I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:
SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic example") .config("spark.master", "local") .getOrCreate();
An update to my answer:
To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).
If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.
Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:
SparkSession .builder() .appName("Java Spark SQL basic example") .getOrCreate();
This will use the configurations specified on the command line/in config files.
Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.
also, see this answer. which links to a list of the options for spark.master and what each one actually does.
a list of the options for spark.master in spark 2.2.1
Worked for me after replacing
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");
with
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");
Found this solution on some other thread on stackoverflow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With