How to enable Fair scheduler?

Tags:

apache-spark

I'd like to understand the internals of Spark's FAIR scheduling mode. The thing is that it seems not so fair as one would expect according to the official Spark documentation:

Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.

It seems like jobs are not handled equally and actually managed in fifo order.

To give more information on the topic:

I am using Spark on YARN. I use the Java API of Spark. To enable the fair mode, The code is :

SparkConf conf = new SparkConf();
conf.set("spark.scheduler.mode", "FAIR");
conf.setMaster("yarn-client").setAppName("MySparkApp");
JavaSparkContext sc = new JavaSparkContext(conf);

Did I miss something?

441

asked May 14 '16 12:05

Thomas2033

1 Answers

It appears that you didn't set up the pools and all your jobs end up in a single default pool as described in Configuring Pool Properties:

Specific pools’ properties can also be modified through a configuration file.

and later

A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).

It can also be that you didn't set up the local property to set up the pool to use for a given job(s) as described in Fair Scheduler Pools:

Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool “local property” to the SparkContext in the thread that’s submitting them.

It can finally mean that you use a single default FIFO pool so one pool in FIFO mode changes nothing comparing to FIFO without pools.

It's only you to know the real answer :)

114

answered Sep 20 '22 07:09

Jacek Laskowski

Related questions
                            
                                Basic Spark example not working
                            
                                winutils.exe chmod command doesn't set permission
                            
                                How to iterate scala wrappedArray? (Spark)
                            
                                sparkSession/sparkContext can not get hadoop configuration
                            
                                How to create Spark Dataset or Dataframe from case classes that contains Enums
                            
                                Spark 2.0 implicit encoder, deal with missing column when type is Option[Seq[String]] (scala)
                            
                                Cumulate arrays from earlier rows (PySpark dataframe)
                            
                                Dropping empty DataFrame partitions in Apache Spark
                            
                                How to merge pyspark and pandas dataframes
                            
                                What is Project node in execution query plan?
                            
                                How to get the size of an RDD in Pyspark?
                            
                                Installing PySpark
                            
                                Mllib dependency error
                            
                                How to run Spark on Docker?
                            
                                Spark Sql registerTempTable and registerDataFrameAsTable difference
                            
                                How to implement Like-condition in SparkSQL?
                            
                                Converting a Scala Iterable[tuple] to RDD
                            
                                How do I put a case class in an rdd and have it act like a tuple(pair)?
                            
                                In PySpark, how can I log to log4j from inside a transformation
                            
                                Using S3 (Frankfurt) with Spark

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With