I have a YARN cluster with a master node running resource manager and 2 other nodes. I am able to submit a spark application from a client machine in "yarn-cluster" mode. Is there a way I can configure which node in the cluster launches the Spark application master?
I ask this because if application master launches in master node it works fine but if it starts in other nodes I get this:
Retrying connect to server: 0.0.0.0/0.0.0.0:8030.
and the job is simply accepted and never runs
If you're using a new enough version of YARN (2.6 or newer, according to Spark docs), you can use node labels in YARN.
This Hortonworks guide walks through applying node labels to your YARN NodeManagers.
If you use Spark 1.6 or newer, then this JIRA added support for using the YARN node labels in Spark; you then simply pass spark.yarn.am.nodeLabelExpression
to restrict AppMaster node placement, and if you ever need it, spark.yarn.executor.nodeLabelExpression
for executor placement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With