Is there a way to oversubscribe GPUs on Slurm, i.e. run multiple jobs/job steps that share one GPU? We've only found ways to oversubscribe CPUs and memory, but not GPUs.
We want to run multiple job steps on the same GPU in parallel and optionally specify the GPU memory used for each step.
The easiest way of doing that is to have the GPU defined as a feature rather than as a gres so Slurm will not manage the GPUs, just make sure that job that need one land on nodes that offer one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With