I have downloaded Apache Spark 1.4.1 pre-built for Hadoop 2.6 and later. I have two Ubuntu 14.04 machines. One of them I have set as the Spark master with a single slave and the second machine is running one Spark slave. When I execute the ./sbin/start-all.sh
command the master and the slaves are started successfully. After that I ran the sample PI program in the spark-shell
setting the --master spark://192.168.0.105:7077
to the Spark master URL displayed in the Spark web UI.
So far everything works great.
I have created a Java application and I tried to configure it to run Spark jobs when needed. I added the spark dependencies in the pom.xml
file.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.4.1</version>
</dependency>
I have created a SparkConfig
:
private parkConf sparkConfig = new SparkConf(true)
.setAppName("Spark Worker")
.setMaster("spark://192.168.0.105:7077");
And I create a SparkContext
using the SparkConfig
:
private SparkContext sparkContext = new SparkContext(sparkConfig);
On this step the following error is thrown:
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1503)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2007)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
at com.storakle.dataimport.spark.StorakleSparkConfig.getSparkContext(StorakleSparkConfig.java:37)
at com.storakle.dataimport.reportprocessing.DidNotBuyProductReport.prepareReportData(DidNotBuyProductReport.java:25)
at com.storakle.dataimport.messagebroker.RabbitMQMessageBroker$1.handleDelivery(RabbitMQMessageBroker.java:56)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
If I change the Spark master to local
everything works just fine.
private parkConf sparkConfig = new SparkConf(true)
.setAppName("Spark Worker")
.setMaster("local");
I am running the Java app on the same machine that hosts the Spark Master.
I have no idea why this is happening? Every documentation and example that I've found so far are indicating that the code should work with the Spark Master URL.
Any ideas why this is happening and how I can fix it? I have spent a lot of time trying to figure this one out and with no luck so far.
I think you use Spark 1.4.1 for Scala 2.10. Therefore, you need spark-core_2.10
and spark-streaming_2.10
instead 2.11
. spark-core_2.11
incompatible with Spark built for Scala 2.10.
For building Spark for Scala 2.11 see:
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With