I would like to submit multiple spark-submit jobs with yarn. When I run
spark-submit --class myclass --master yarn --deploy-mode cluster blah blah
as it is now, I have to wait for the job to complete for me to submit more jobs. I see the heartbeat:
16/09/19 16:12:41 INFO yarn.Client: Application report for application_1474313490816_0015 (state: RUNNING)
16/09/19 16:12:42 INFO yarn.Client: Application report for application_1474313490816_0015 (state: RUNNING)
How can I tell yarn to pick up another job all from the same terminal. Ultimately I want to be able to run from a script where I cand send hundreds of jobs in one go.
Thank you.
Every user has a fixed capacity as specified in the yarn configuration. If you are allocated N executors (usually, you will be allocated some fixed number of vcores), and you want to run 100 jobs, you will need to specify the allocation to each of the jobs:
spark-submit --num-executors N/100 --executor-cores 5
Otherwise, the jobs will loop in accepted.
You can launch multiple jobs in parallel using & at the last of every invocation.
for i inseq 20; do spark-submit --master yarn --num-executors N/100 --executor-cores 5 blah blah &; done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With