I am trying to run some parallel code on a cluster. The cluster uses slurm and my code is in python. The code uses multiple cores when I run it on my own machine. However, when I try to run the code on the cluster it is extremely slow and does not appear to be using multiple cores.
Here is the relevant code from python:
from multiprocessing import Pool
Nz_i=range(1,13)
p=Pool()
p.map(Err_Calc,Nz_i)
p.close()
p.join()
the function Err_Calc is defined earlier on. I don't think its definition is relevant.
The SBATCH I am using to run the code on the cluster is the following:
#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH --ntasks-per-node 13
#SBATCH -t 03:10:00
module load python/intel_2.7.14
python Err_vs_Nz_Cl.py
The file Err_vs_Nz_Cl.py contains the code I showed above. I would expect this SBATCH to provide me with 13 cores, but the code seems to be using only 1 core or perhaps is slow for some other reason. Does anyone know what's going wrong?
This may be wrong (I'm a newbie to this), but what happens if you change the --ntasks-per-node 13 argument to --cpus-per-task 13 ? I think the docs say that you need to explicitly specify the number of cpus in this way, else it will only run with one cpu.
Source: https://slurm.schedmd.com/sbatch.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With