Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting YARN queue in PySpark

When creating a Spark context in PySpark, I typically use the following code:

conf = (SparkConf().setMaster("yarn-client").setAppName(appname)
        .set("spark.executor.memory", "10g")
        .set("spark.executor.instances", "7")
        .set("spark.driver.memory", "5g")
        .set("spark.shuffle.service.enabled","true")
        .set("spark.dynamicAllocation.enabled","true")
        .set("spark.dynamicAllocation.minExecutors","5")
        )
sc = SparkContext(conf=conf)

However, this puts it in the default queue, which is almost always over capacity. We have several less busy queues available, so my question is - how do I set my Spark context to use another queue?

Edit: To clarify - I'm looking to set the queue for interactive jobs (e.g., exploratory analysis in a Jupyter notebook), so I can't set the queue with spark-submit.

like image 415
Tim Avatar asked Feb 06 '18 15:02

Tim


2 Answers

You can use below argument in you spark-submit command.

--queue queue_name

You can set this property in your code. spark.yarn.queue

Hope this will help.

Thanks

like image 103
Manu Gupta Avatar answered Oct 02 '22 17:10

Manu Gupta


Try to use spark.yarn.queue rather than queue.

conf = pyspark.SparkConf().set("spark.yarn.queue", "your_queue_name")
sc
like image 31
Bean Dog Avatar answered Oct 02 '22 17:10

Bean Dog