I'm a bit of a newbie on mpirun, and I'm running into a small issue. I have n jobs I want to run on only 2 cores of my machine, so I open n terminal windows and use the usual
mpirun -np 2 [program]
in each terminal window, but instead of using up 2*n cores, it uses only a fraction of that, and the applications are very slow, leading me to believe mpirun is stacking multiple jobs on the same cores without touching other cores on the same CPU, making the jobs unbearably slow and overall reducing the workflow efficiency...
I've tried using the option
--bind-to core
in each call, but that doesn't seem to change anything in the behavior of mpirun...
What could be causing this behavior and how can I tackle it so that it wouldn't stack jobs on the same cores until there wouldn't be enough cores to satisfy demand?
Thanks a lot!
The default behavior of Open MPI is --bind-to core
when running a 2 MPI tasks job.
The issue here is that MPI jobs started from the terminal are independent, and they all pin task 0 on cpu 0, and task 1 on cpu 1, so they end up time sharing the first two cores.
A lesser evil would be to --bind-to none
to prevent Open MPI from binding the MPI tasks, and let the Linux scheduler use all the available cores. It is up to you to ensure you are not running more MPI tasks than cores at any given time (otherwise you will go back to time sharing).
The right fix would be to use a job scheduler such as SLURM, that will ensure any given core is running no more than one MPI task at any given time.
A manual solution would be to manually restrict each MPI job to two cores
$ taskset -c 0,1 mpirun -np 2 a.out
$ taskset -c 2,3 mpirun -np 2 a.out
...
but it remains up to you to not to share any core between two jobs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With