Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is "Cannot call methods on a stopped SparkContext" thrown when connecting to Spark Standalone from Java application?

I have downloaded Apache Spark 1.4.1 pre-built for Hadoop 2.6 and later. I have two Ubuntu 14.04 machines. One of them I have set as the Spark master with a single slave and the second machine is running one Spark slave. When I execute the ./sbin/start-all.sh command the master and the slaves are started successfully. After that I ran the sample PI program in the spark-shell setting the --master spark://192.168.0.105:7077 to the Spark master URL displayed in the Spark web UI.

So far everything works great.

I have created a Java application and I tried to configure it to run Spark jobs when needed. I added the spark dependencies in the pom.xml file.

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>1.4.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>1.4.1</version>
        </dependency>

I have created a SparkConfig:

private parkConf sparkConfig = new SparkConf(true)
            .setAppName("Spark Worker")
            .setMaster("spark://192.168.0.105:7077");

And I create a SparkContext using the SparkConfig:

private SparkContext sparkContext = new SparkContext(sparkConfig);

On this step the following error is thrown:

java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
    at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
    at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1503)
    at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2007)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
    at com.storakle.dataimport.spark.StorakleSparkConfig.getSparkContext(StorakleSparkConfig.java:37)
    at com.storakle.dataimport.reportprocessing.DidNotBuyProductReport.prepareReportData(DidNotBuyProductReport.java:25)
    at com.storakle.dataimport.messagebroker.RabbitMQMessageBroker$1.handleDelivery(RabbitMQMessageBroker.java:56)
    at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
    at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

If I change the Spark master to local everything works just fine.

private parkConf sparkConfig = new SparkConf(true)
                .setAppName("Spark Worker")
                .setMaster("local");

I am running the Java app on the same machine that hosts the Spark Master.

I have no idea why this is happening? Every documentation and example that I've found so far are indicating that the code should work with the Spark Master URL.

Any ideas why this is happening and how I can fix it? I have spent a lot of time trying to figure this one out and with no luck so far.

like image 643
Ivan Stoyanov Avatar asked Nov 09 '15 09:11

Ivan Stoyanov


1 Answers

I think you use Spark 1.4.1 for Scala 2.10. Therefore, you need spark-core_2.10 and spark-streaming_2.10 instead 2.11. spark-core_2.11 incompatible with Spark built for Scala 2.10.

For building Spark for Scala 2.11 see:

http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211

like image 111
Leonard Avatar answered Sep 29 '22 08:09

Leonard