I am using apache spark 0.8.0 to process a large data file and perform some basic .map
and .reduceByKey
operations on the RDD
.
Since I am using a single machine with multiple processors, I mention local[8]
in the Master URL field while creating SparkContext
val sc = new SparkContext("local[8]", "Tower-Aggs", SPARK_HOME )
But whenever I mention multiple processors, the job gets stuck (pauses/halts) randomly. There is no definite place where it gets stuck, its just random. Sometimes it won't happen at all. I am not sure if it continues after that but it gets stuck for a long time after which I abort the job.
But when I just use local
in place of local[8]
, the job runs seamlessly without getting stuck ever.
val sc = new SparkContext("local", "Tower-Aggs", SPARK_HOME )
I am not able to understand where is the problem.
I am using Scala 2.9.3
and sbt
to build and run the application
I'm using spark 1.0.0 and met the same problem: if a function passed to a transformation or action wait/loop indefinitely, then spark won't wake it or terminate/retry it by default, in which case you can kill the task.
However, a recent feature (speculative task) allows spark to start replicated tasks if a few tasks take much longer than average running time of their peers. This can be enabled and configured in the following config properties:
spark.speculation false If set to "true", performs speculative execution of tasks. This means if one or more tasks are running slowly in a stage, they will be re-launched.
spark.speculation.interval 100 How often Spark will check for tasks to speculate, in milliseconds.
spark.speculation.quantile 0.75 Percentage of tasks which must be complete before speculation is enabled for a particular stage.
spark.speculation.multiplier 1.5 How many times slower a task is than the median to be considered for speculation.
(source: http://spark.apache.org/docs/latest/configuration.html)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With