Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiprocessing with python on a single node using slurm

I am trying to run some parallel code on a cluster. The cluster uses slurm and my code is in python. The code uses multiple cores when I run it on my own machine. However, when I try to run the code on the cluster it is extremely slow and does not appear to be using multiple cores.

Here is the relevant code from python:

from multiprocessing import Pool

Nz_i=range(1,13)

p=Pool()
p.map(Err_Calc,Nz_i)
p.close()
p.join()

the function Err_Calc is defined earlier on. I don't think its definition is relevant.

The SBATCH I am using to run the code on the cluster is the following:

#!/bin/bash
#SBATCH -N 1
#SBATCH -p RM-shared
#SBATCH --ntasks-per-node 13
#SBATCH -t 03:10:00

module load python/intel_2.7.14

python Err_vs_Nz_Cl.py 

The file Err_vs_Nz_Cl.py contains the code I showed above. I would expect this SBATCH to provide me with 13 cores, but the code seems to be using only 1 core or perhaps is slow for some other reason. Does anyone know what's going wrong?

like image 234
John Stenger Avatar asked May 19 '26 08:05

John Stenger


1 Answers

This may be wrong (I'm a newbie to this), but what happens if you change the --ntasks-per-node 13 argument to --cpus-per-task 13 ? I think the docs say that you need to explicitly specify the number of cpus in this way, else it will only run with one cpu.

Source: https://slurm.schedmd.com/sbatch.html

like image 83
Tylemaster Avatar answered May 22 '26 00:05

Tylemaster



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!