I'm trying to use YARN node labels to tag worker nodes, but when I run applications on YARN (Spark or simple YARN app), those applications cannot start.
with Spark, when specifying --conf spark.yarn.am.nodeLabelExpression="my-label"
, the job cannot start (blocked on Submitted application [...]
, see details below).
with a YARN application (like distributedshell
), when specifying -node_label_expression my-label
, the application cannot start neither
Here are the tests I have made so far.
I'm using Google Dataproc to run my cluster (example : 4 workers, 2 on preemptible nodes). My goal is to force any YARN application master to run on a non-preemptible node, otherwise the node can be shutdown at any time, thus making the application fail hard.
I'm creating the cluster using YARN properties (--properties
) to enable node labels :
gcloud dataproc clusters create \
my-dataproc-cluster \
--project [PROJECT_ID] \
--zone [ZONE] \
--master-machine-type n1-standard-1 \
--master-boot-disk-size 10 \
--num-workers 2 \
--worker-machine-type n1-standard-1 \
--worker-boot-disk-size 10 \
--num-preemptible-workers 2 \
--properties 'yarn:yarn.node-labels.enabled=true,yarn:yarn.node-labels.fs-store.root-dir=/system/yarn/node-labels'
Versions of packaged Hadoop and Spark :
After that, I create a label (my-label
), and update the two non-preemptible workers with this label :
yarn rmadmin -addToClusterNodeLabels "my-label(exclusive=false)"
yarn rmadmin -replaceLabelsOnNode "\
[WORKER_0_NAME].c.[PROJECT_ID].internal=my-label \
[WORKER_1_NAME].c.[PROJECT_ID].internal=my-label"
I can see the created label in YARN Web UI :
When I run a simple example (SparkPi
) without specifying info about node labels :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
In the Scheduler tab on YARN Web UI, I see the application launched on <DEFAULT_PARTITION>.root.default
.
But when I run the job specifying spark.yarn.am.nodeLabelExpression
to set the location of the Spark application master :
spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--conf spark.yarn.am.nodeLabelExpression="my-label" \
/usr/lib/spark/examples/jars/spark-examples.jar \
10
The job is not launched. From YARN Web UI, I see :
ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ;
I suspect that the queue related to the label partition (not <DEFAULT_PARTITION
, the other one) does not have sufficient resources to run the job :
Here, Used Application Master Resources
is <memory:1024, vCores:1>
, but the Max Application Master Resources
is <memory:0, vCores:0>
. That explains why the application cannot start, but I can't figure out how to change this.
I tried to update different parameters, but without success :
yarn.scheduler.capacity.root.default.accessible-node-labels=my-label
Or increasing those properties :
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-capacity
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.maximum-am-resource-percent
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.user-limit-factor
yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.minimum-user-limit-percent
without success neither.
The issue is the same when running a YARN application :
hadoop jar \
/usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-shell_command "echo ok" \
-jar /usr/lib/hadoop-yarn/hadoop-yarn-applications-distributedshell.jar \
-queue default \
-node_label_expression my-label
The application cannot start, and the logs keeps repeating :
INFO distributedshell.Client: Got application report from ASM for, appId=6, clientToAMToken=null, appDiagnostics= Application is Activated, waiting for resources to be assigned for AM. Details : AM Partition = my-label ; Partition Resource = <memory:6144, vCores:2> ; Queue's Absolute capacity = 0.0 % ; Queue's Absolute used capacity = 0.0 % ; Queue's Absolute max capacity = 0.0 % ; , appMasterHost=N/A, appQueue=default, appMasterRpcPort=-1, appStartTime=1520354045946, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, [...]
If I don't specify -node_label_expression my-label
, the application start on <DEFAULT_PARTITION>.root.default
and succeed.
Thanks for helping
Node label is a way to group nodes with similar characteristics and spark jobs can be specified where to run. With node labelling, we can achieve partition on the cluster, and by default, nodes belong to the DEFAULT partition.
The master node manages the cluster and typically runs master components of distributed applications. For example, the master node runs the YARN ResourceManager service to manage resources for applications.
Node label is a way to group nodes with similar characteristics and applications can specify where to run. Now we only support node partition, which is: One node can have only one node partition, so a cluster is partitioned to several disjoint sub-clusters by node partitions.
One of Apache Hadoop's core components, YARN is responsible for allocating system resources to the various applications running in a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.
A Google engineer answered us (on a private issue we raised, not in the PIT), and gave us a solution by specifying an initialization script to Dataproc cluster creation. I don't think the issue comes from Dataproc, this is basically just YARN configuration. The script sets the following properties in capacity-scheduler.xml
, just after creating the node label (my-label
) :
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels</name>
<value>my-label</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.accessible-node-labels.my-label.capacity</name>
<value>100</value>
</property>
From the comment going along with the script, this "set accessible-node-labels
on both root
(the root queue) and root.default
(the default queue applications actually get run on)". The root.default
part is what was missing in my tests. Capacity for both is set to 100.
Then, restarting YARN (systemctl restart hadoop-yarn-resourcemanager.service
) is needed to validate the modifications.
After that, I was able to start jobs that failed to complete in my question.
Hope that will help people having the same issues or similar.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With