Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't connect from application to the standalone cluster

Tags:

apache-spark

I'm trying to connect from application to Spark's standalone cluster. I want to do this on one machine. I run standalone master server by command:

bash start-master.sh    

Then I run one worker by command:

bash spark-class org.apache.spark.deploy.worker.Worker spark://PC:7077 -m 512m

(I allocated 512 MBs for it).

At master’s web UI:

http://localhost:8080

I see, that master and worker are running.

Then I try to connect from application to cluster, with following command:

JavaSparkContext sc = new JavaSparkContext("spark://PC:7077", "myapplication");

When I run application it's crashing with following error message:

4/11/01 22:53:26 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077...        
    14/11/01 22:53:26 INFO spark.SparkContext: Starting job: collect at App.java:115
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Got job 0 (collect at App.java:115)         with 2 output partitions (allowLocal=false)
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at         App.java:115)
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Parents of final stage: List()
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Missing parents: List()
    14/11/01 22:53:26 INFO scheduler.DAGScheduler: Submitting Stage 0                 (ParallelCollectionRDD[0] at parallelize at App.java:109), which has no missing parents
    14/11/01 22:53:27 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from         Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109)
    14/11/01 22:53:27 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
    14/11/01 22:53:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:53:46 INFO client.AppClient$ClientActor: Connecting to master         spark://PC:7077...
    14/11/01 22:53:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:54:06 INFO client.AppClient$ClientActor: Connecting to master         spark://PC:7077...
    14/11/01 22:54:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted         any resources; check your cluster UI to ensure that workers are         registered and have sufficient memory
    14/11/01 22:54:26 ERROR cluster.SparkDeploySchedulerBackend: Application has been         killed. Reason: All masters are unresponsive! Giving up.
    14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose         tasks have all completed, from pool 
    14/11/01 22:54:26 INFO scheduler.DAGScheduler: Failed to run collect at         App.java:115
    Exception in thread "main" 14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl:         Cancelling stage 0
    org.apache.spark.SparkException: Job aborted due to stage failure: All masters are         unresponsive! Giving up.
        at         org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAnd        IndependentStages(DAGScheduler.scala:1033)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017        )
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015        )
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s        cala:633)
        at         org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s        cala:633)
        at scala.Option.foreach(Option.scala:236)
        at         org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)
        at         org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAG        Scheduler.scala:1207)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at         akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/metrics/json,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/static,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/executors/json,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/executors,null}
    14/11/01 22:54:26 INFO handler.ContextHandler: stopped         o.e.j.s.ServletContextHandler{/environment/json,null}

Any ideas what is going on?

P.S. I'm using pre-built version of Spark - spark-1.1.0-bin-hadoop2.4.

Thank You.

like image 762
dimson Avatar asked Oct 19 '22 23:10

dimson


1 Answers

Make sure that both the standalone workers and the Spark driver are connected to the Spark master on the exact address listed in its web UI / printed in its startup log message. Spark uses Akka for some of its control-plane communication and Akka can be really picky about hostnames, so these need to match exactly.

There are several options to control which hostnames / network interfaces the driver and master will bind to. Probably the simplest option is to set the SPARK_LOCAL_IP environment variable to control the address that the Master / Driver will bind to. See http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html for an overview of the other settings that affect network address binding.

like image 78
Josh Rosen Avatar answered Oct 30 '22 03:10

Josh Rosen