I'm executing some spark(scala) sql code in spark shell. I want to know which queue I am using and if possible I want to know how much memory and executors I am using and how to optimize it?
Launching Spark on YARNEnsure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. These configs are used to write to HDFS and connect to the YARN ResourceManager.
Open the Resource Manager UI and confirm the Queues configured. Login to the cluster and submit the job to the spark Queue. In the logs, you can see the output from the spark job. Thus, you are able to run the Spark Jobs in different Queue.
You can set queue name, number of executors, executor memory, number of total cores, cores per executor, driver memory,etc when you start spark shell or spark-submit
here is how you can specify the parameters.
spark-shell --executor-memory 6G --executor-cores 5 --num-executors 20 --driver-memory 2G --queue $queue_name
You should be calculating these parameters as per your cluster capacity according to fat executor or thin executor concept.
If you still want to check resources utilization, you can check resource manager page or SPARK web UI page
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With