How to deal with tasks running too long (comparing to others in job) in yarn-client?

1 Answers

There is no way for spark to kill its tasks if its taking too long.

But I figured out a way to handle this using speculation,

This means if one or more tasks are running slowly in a stage, they will be re-launched.

spark.speculation                  true
spark.speculation.multiplier       2
spark.speculation.quantile         0

Note: spark.speculation.quantile means the "speculation" will kick in from your first task. So use it with caution. I am using it because some jobs get slowed down due to GC over time. So I think you should know when to use this - its not a silver bullet.

Some relevant links: http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-always-wait-for-stragglers-to-finish-running-td14298.html and http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAPmMX=rOVQf7JtDu0uwnp1xNYNyz4xPgXYayKex42AZ_9Pvjug@mail.gmail.com%3E

Update

I found a fix for my issue (might not work for everyone). I had a bunch of simulations running per task, so I added timeout around the run. If a simulation is taking longer (due to a data skew for that specific run), it will timeout.

ExecutorService executor = Executors.newCachedThreadPool();
Callable<SimResult> task = () -> simulator.run();

Future<SimResult> future = executor.submit(task);
try {
    result = future.get(1, TimeUnit.MINUTES);
} catch (TimeoutException ex) {
    future.cancel(true);
    SPARKLOG.info("Task timed out");
}

Make sure you handle an interrupt inside the simulator's main loop like:

if(Thread.currentThread().isInterrupted()){
    throw new InterruptedException();
}

194

answered Sep 17 '22 05:09

zengr

Related questions
                            
                                Spark : multiple spark-submit in parallel
                            
                                How to add source file name to each row in Spark?
                            
                                --files option in pyspark not working
                            
                                Spark: how to use SparkContext.textFile for local file system
                            
                                Applying function to Spark Dataframe Column
                            
                                What is a glom?. How it is different from mapPartitions?
                            
                                Pyspark : forward fill with last observation for a DataFrame
                            
                                Read from a hive table and write back to it using spark sql
                            
                                pyspark parse fixed width text file
                            
                                Error while exploding a struct column in Spark
                            
                                In Spark API, What is the difference between makeRDD functions and parallelize function?
                            
                                Spark DataFrame and renaming multiple columns (Java)
                            
                                How do I order fields of my Row objects in Spark (Python)
                            
                                How to read streaming dataset once and output to multiple sinks?
                            
                                Difference between sc.textFile and spark.read.text in Spark
                            
                                Spark: Repartition strategy after reading text file
                            
                                How does Spark interoperate with CPython
                            
                                Scale(Normalise) a column in SPARK Dataframe - Pyspark
                            
                                Exception: java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. in spark
                            
                                Addition of two RDD[mllib.linalg.Vector]'s

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to deal with tasks running too long (comparing to others in job) in yarn-client?

Tags:

apache-spark

hadoop-yarn

parquet

tnk_peka

People also ask

1 Answers

zengr

Recent Activity

Donate For Us