I'm running a spark job on a Google DataProc cluster. But looks like Spark is not using all the vcores
available in the cluster as you can see below
Based on some other questions like this and this, i have setup the cluster to use DominantResourceCalculator
to consider both vcpus and memory for resource allocation
gcloud dataproc clusters create cluster_name --bucket="profiling-
job-default" \
--zone=europe-west1-c \
--master-boot-disk-size=500GB \
--worker-boot-disk-size=500GB \
--master-machine-type=n1-standard-16 \
--num-workers=10 \
--worker-machine-type=n1-standard-16 \
--initialization-actions gs://custom_init_gcp.sh \
--metadata MINICONDA_VARIANT=2 \
--properties=^--^yarn:yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
But when i submit my job with custom spark flags, looks like YARN doesn't respect these custom parameters and defaults to using memory as the yardstick for resource calculation
gcloud dataproc jobs submit pyspark --cluster cluster_name \
--properties spark.sql.broadcastTimeout=900,spark.network.timeout=800\
,yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator\
,spark.dynamicAllocation.enabled=true\
,spark.executor.instances=10\
,spark.executor.cores=14\
,spark.executor.memory=15g\
,spark.driver.memory=50g \
src/my_python_file.py
Can help somebody figure out what's going on here?
What I did wrong was to add the configuration yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
to YARN
instead of the capacity-scheduler.xml
(as it should be rightly) while cluster creation
Secondly, i changed yarn:yarn.scheduler.minimum-allocation-vcores
which was initially set to 1
.
I'm not sure if either one of these or both of these changes led to the solution (i will update soon). My new cluster creation looks like below:
gcloud dataproc clusters create cluster_name --bucket="profiling-
job-default" \
--zone=europe-west1-c \
--master-boot-disk-size=500GB \
--worker-boot-disk-size=500GB \
--master-machine-type=n1-standard-16 \
--num-workers=10 \
--worker-machine-type=n1-standard-16 \
--initialization-actions gs://custom_init_gcp.sh \
--metadata MINICONDA_VARIANT=2 \
--properties=^--^yarn:yarn.scheduler.minimum-allocation-vcores=4--capacity-scheduler:yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With