I am looking for a way to limit the number of batch jobs that are running by holding the remaining jobs in the queue. Is it possible with aws batch?
I observe that a maximum of 2 jobs can run concurrently at any point in time.
AWS Batch will scale up instances appropriate for your jobs based on the required accelerators and isolate the accelerators according to each job's needs, so only the appropriate containers can access them.
AWS Batch lets developers, scientists, and engineers efficiently run hundreds of thousands of batch and ML computing jobs while optimizing compute resources, so you can focus on analyzing results and solving problems.
An AWS Batch multi-node parallel job is compatible with any framework that supports IP-based, internode communication. Examples include Apache MXNet, TensorFlow, Caffe2, or Message Passing Interface (MPI). Multi-node parallel jobs are submitted as a single job.
Limiting the maximum number of vcpus of the managed compute environment the queue is tied to will effectively limit the number of batch jobs running concurrently on that queue.
However, this comes with the caveat that, if you have other queues sharing this compute environment, they would also be limited accordingly. Moreover, if you have multiple compute environments associated with that queue you are attempting to limit, Batch will eventually begin scheduling jobs on the secondary compute environments if there are enough jobs waiting in the RUNNABLE
state.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With