Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to run Spark with Mesos

I set up Spark-0.9.1 to run on mesos-0.13.0 using the steps mentioned here. The Mesos UI is showing two workers registered. I want to run these commands on Spark-shell

> scala> val data = 1 to 10000 data:
> scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, 5, 6,
> 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
> 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
> 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58,
> 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75,
> 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92,
> 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
> 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121,
> 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135,
> 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149,
> 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
> 164, 165, 166, 167, 168, 169, 170...


> scala> val distData = sc.parallelize(data) distData:
> org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at
> parallelize at <console>:14

Now when i run the collect method, the following error occurs.

> scala> distData.filter(_< 10).collect()
14/06/03 19:54:55 INFO SparkContext: Starting job: collect at <console>:17
14/06/03 19:54:55 INFO DAGScheduler: Got job 0 (collect at <console>:17) with 8 output partitions (allowLocal=false)
14/06/03 19:54:55 INFO DAGScheduler: Final stage: Stage 0 (collect at <console>:17)
14/06/03 19:54:55 INFO DAGScheduler: Parents of final stage: List()
14/06/03 19:54:55 INFO DAGScheduler: Missing parents: List()
14/06/03 19:54:55 INFO DAGScheduler: Submitting Stage 0 (FilteredRDD[1] at filter at <console>:17), which has no missing parents
14/06/03 19:54:55 INFO DAGScheduler: Submitting 8 missing tasks from Stage 0 (FilteredRDD[1] at filter at <console>:17)
14/06/03 19:54:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 8 tasks
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes in 8 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes in 0 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:2 as TID 2 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes in 0 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:3 as TID 3 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes in 1 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:4 as TID 4 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes in 0 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes in 0 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes in 0 ms
14/06/03 19:54:55 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:55 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes in 0 ms
14/06/03 19:54:56 INFO TaskSetManager: Re-queueing tasks for 201406031732-3213994176-5050-6320-10 from TaskSet 0.0
14/06/03 19:54:56 WARN TaskSetManager: Lost TID 5 (task 0.0:5)
14/06/03 19:54:56 WARN TaskSetManager: Lost TID 7 (task 0.0:7)
14/06/03 19:54:56 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
14/06/03 19:54:56 WARN TaskSetManager: Lost TID 3 (task 0.0:3)
14/06/03 19:54:56 INFO DAGScheduler: Executor lost: 201406031732-3213994176-5050-6320-10 (epoch 0)
14/06/03 19:54:56 INFO BlockManagerMasterActor: Trying to remove executor 201406031732-3213994176-5050-6320-10 from BlockManagerMaster.
14/06/03 19:54:56 INFO BlockManagerMaster: Removed 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:3 as TID 8 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes in 0 ms
14/06/03 19:54:56 INFO DAGScheduler: Host gained which was in lost list earlier: host-DSRV04.host 
14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:1 as TID 9 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes in 0 ms
14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:7 as TID 10 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes in 0 ms
14/06/03 19:54:56 INFO TaskSetManager: Starting task 0.0:5 as TID 11 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:56 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Re-queueing tasks for 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 8 (task 0.0:3)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 2 (task 0.0:2)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 4 (task 0.0:4)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 10 (task 0.0:7)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 6 (task 0.0:6)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 0 (task 0.0:0)
14/06/03 19:54:57 INFO DAGScheduler: Executor lost: 201406031732-3213994176-5050-6320-11 (epoch 1)
14/06/03 19:54:57 INFO BlockManagerMasterActor: Trying to remove executor 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
14/06/03 19:54:57 INFO BlockManagerMaster: Removed 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
14/06/03 19:54:57 INFO DAGScheduler: Host gained which was in lost list earlier: host-DSRV05.host 
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:0 as TID 12 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes in 1 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:6 as TID 13 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:7 as TID 14 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes in 1 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:4 as TID 15 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:2 as TID 16 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:3 as TID 17 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes in 1 ms
14/06/03 19:54:57 INFO TaskSetManager: Re-queueing tasks for 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 14 (task 0.0:7)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 16 (task 0.0:2)
14/06/03 19:54:57 WARN TaskSetManager: Lost TID 12 (task 0.0:0)
14/06/03 19:54:57 INFO DAGScheduler: Executor lost: 201406031732-3213994176-5050-6320-11 (epoch 2)
14/06/03 19:54:57 INFO BlockManagerMasterActor: Trying to remove executor 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
14/06/03 19:54:57 INFO BlockManagerMaster: Removed 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
14/06/03 19:54:57 INFO DAGScheduler: Host gained which was in lost list earlier: host-DSRV05.host 
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:0 as TID 18 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:0 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:2 as TID 19 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:2 as 1338 bytes in 0 ms
14/06/03 19:54:57 INFO TaskSetManager: Starting task 0.0:7 as TID 20 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:57 INFO TaskSetManager: Serialized task 0.0:7 as 1338 bytes in 0 ms
14/06/03 19:54:58 INFO TaskSetManager: Re-queueing tasks for 201406031732-3213994176-5050-6320-10 from TaskSet 0.0
14/06/03 19:54:58 WARN TaskSetManager: Lost TID 17 (task 0.0:3)
14/06/03 19:54:58 WARN TaskSetManager: Lost TID 11 (task 0.0:5)
14/06/03 19:54:58 WARN TaskSetManager: Lost TID 13 (task 0.0:6)
14/06/03 19:54:58 WARN TaskSetManager: Lost TID 9 (task 0.0:1)
14/06/03 19:54:58 WARN TaskSetManager: Lost TID 15 (task 0.0:4)
14/06/03 19:54:58 INFO DAGScheduler: Executor lost: 201406031732-3213994176-5050-6320-10 (epoch 3)
14/06/03 19:54:58 INFO BlockManagerMasterActor: Trying to remove executor 201406031732-3213994176-5050-6320-10 from BlockManagerMaster.
14/06/03 19:54:58 INFO BlockManagerMaster: Removed 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
14/06/03 19:54:58 INFO DAGScheduler: Host gained which was in lost list earlier: host-DSRV04.host 
14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:4 as TID 21 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:4 as 1338 bytes in 0 ms
14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:1 as TID 22 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:1 as 1338 bytes in 0 ms
14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:6 as TID 23 on executor 201406031732-3213994176-5050-6320-11: host-DSRV05.host  (PROCESS_LOCAL)
14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:6 as 1338 bytes in 0 ms
14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:5 as TID 24 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:5 as 1338 bytes in 1 ms
14/06/03 19:54:58 INFO TaskSetManager: Starting task 0.0:3 as TID 25 on executor 201406031732-3213994176-5050-6320-10: host-DSRV04.host  (PROCESS_LOCAL)
14/06/03 19:54:58 INFO TaskSetManager: Serialized task 0.0:3 as 1338 bytes in 0 ms
14/06/03 19:54:59 INFO TaskSetManager: Re-queueing tasks for 201406031732-3213994176-5050-6320-11 from TaskSet 0.0
14/06/03 19:54:59 WARN TaskSetManager: Lost TID 23 (task 0.0:6)
14/06/03 19:54:59 WARN TaskSetManager: Lost TID 20 (task 0.0:7)
14/06/03 19:54:59 ERROR TaskSetManager: Task 0.0:7 failed 4 times; aborting job
14/06/03 19:54:59 INFO DAGScheduler: Failed to run collect at <console>:17
14/06/03 19:54:59 INFO DAGScheduler: Executor lost: 201406031732-3213994176-5050-6320-11 (epoch 4)
14/06/03 19:54:59 INFO BlockManagerMasterActor: Trying to remove executor 201406031732-3213994176-5050-6320-11 from BlockManagerMaster.
14/06/03 19:54:59 INFO BlockManagerMaster: Removed 201406031732-3213994176-5050-6320-11 successfully in removeExecutor
14/06/03 19:54:59 INFO DAGScheduler: Host gained which was in lost list earlier: host-DSRV05.host 
org.apache.spark.SparkException: Job aborted: Task 0.0:7 failed 4 times (most recent failure: unknown)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
        at akka.actor.ActorCell.invoke(ActorCell.scala:456)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
        at akka.dispatch.Mailbox.run(Mailbox.scala:219)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 
> 
> scala> 14/06/03 19:55:00 INFO TaskSetManager: Re-queueing tasks for
> 201406031732-3213994176-5050-6320-10 from TaskSet 0.0 14/06/03
> 19:55:00 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have
> all completed, from pool 14/06/03 19:55:00 INFO DAGScheduler: Executor
> lost: 201406031732-3213994176-5050-6320-10 (epoch 5) 14/06/03 19:55:00
> INFO BlockManagerMasterActor: Trying to remove executor
> 201406031732-3213994176-5050-6320-10 from BlockManagerMaster. 14/06/03
> 19:55:00 INFO BlockManagerMaster: Removed
> 201406031732-3213994176-5050-6320-10 successfully in removeExecutor
> 14/06/03 19:55:00 INFO DAGScheduler: Host gained which was in lost
> list earlier: host-DSRV04.host 

I've checked my configuration of spark many times and it looks fine to me. Any ideas what might have gone wrong?

-- Thanks

like image 537
Pravesh Jain Avatar asked Sep 30 '22 15:09

Pravesh Jain


1 Answers

As it turns out my tar file wasn't created properly. Recreated it and its working fine now. Sorry for the trouble.

like image 196
Pravesh Jain Avatar answered Oct 05 '22 08:10

Pravesh Jain